In a previous article I described a number of challenges faced migrating a customer environment to Azure Virtual WAN. One fundamental challenge we faced was with asymmetric routing occuring between Azure Virtual WAN and the legacy environments. In this blog post, I will describe the issue in more detail and the steps taken to mitigate it.
Asymmetric Routing
Asymmetric routing is described as “where a packet takes one path to the destination and takes another path when returning to the source”. In the below example, our traffic to the server (Blue line) takes a different path to the return traffic (Red line).
In our scenario, traffic was exiting the network from one ExpressRoute router and returning via another.
AS_PATH
To describe what was happening to our customer, we need to dig a little more into AS_PATH and route selection. The AS_PATH attribute provides the list of Autonomous Systems through which the routing information has passed. Routers use this information to help determine the shortest path to route traffic. Where two paths are of equal value, other algorithms are used. As I’m no routing expert, I won’t be digging into this any deeper.
In our example above the DC router in “region A2” has 2 routes to our connecting host. If we look at the Blue path, the path traverses as follows:
Host -> Edge Gateway (AS 12076) -> ExpressRoute (AS 60000) -> Region A1 router (AS 60100)-> internal cloud (AS 63000) -> Region A2 Router (AS 62000) -> Destination
For this route, the path is:
62000 63000 60100 60000 12076
For the Red Path, the path is as follows:
Host -> hub Gateway (AS 65520) -> hub gateway (AS 65520) -> Edge Gateway (AS 65515) -> Express Route (AS 62100) -> Region A2 router (AS 62000) -> Destination
The path looks like:
62000 62100 65515 65520 65520
e
As you can see, both paths are of equal length, therefore other routing algorithms take place. In our case, the nearest exit was used by Azure and the nearest entry was used by the router thus creating our asymmetric route.
What was needed was a way to manage our AS_Path information. Enter “Route Maps”. Route Maps can, among other things, allow us to alter the length of the AS_Path information to help create a preferred path while still maintaining a redundant path in the event of a system failure.
Azure Virtual Hub and Route Maps
A new feature (still in preview at the time of the writing of this post) in the Azure Virtual Hub is Route Maps. In Azure, this feature lets you perform route aggregation, route filtering, and gives you the ability to modify BGP attributes such as AS-PATH and Community to manage routes and routing decisions.
Unfortunately it was still in Preview and not supported for production workloads, so I looked to use Route Maps on the other side of the ExpressRoute, at the DC router layer.
Modifying the AS_PATH at the DC Router
The routers we were using supported route maps but we needed a way to identify the routes to be modified. The simplest approach was to create a list, but this is a very manual approach requiring ongoing maintenance as the Azure networks change. This went against one of the key tenets of our migration project, keep it simple. We needed a way to identify routes based on their key location within the virtual WAN in such a way that changes would be reflected dynamically.
The customers network schema had network ranges that were unique per region. However for some reason the routers were not allowing us to use this as our defining attribute for the route map. “If only we could set a BGP community string” lamented my networking colleague.
BGP Community strings
Custom BGP community strings was something I knew Azure supported for working with ExpressRoute connections. I proceeded to set up custom strings on test networks in regions A1 and A2.
The local ExpressRoute router for each region automatically reflected the changes but something unusual was noted. The inter-hub communication was stripping the BGP community string so that the router in the opposite region could not see the community string. Further investigation and a quick call with Microsoft revealed that virtual WAN does not folly support BGP community strings. While this looked to derail a simple solution, all was not lost.
Default Regional Community Tags
Virtual WAN may not support custom community strings, but while helping debug the issue I noticed that virtual WAN uses its own regional strings and these strings are shared with our ExpressRoute routers. Routes connected to region A1 were tagged with a regional community string on the router in region A2 and vice versa.
Now we were back in business. I had a unique tag per region attached to a route by virtual WAN if it crossed inter-hub links. I was now able to identify if a network was not locally peered to the same virtual hub as the ExpressRoute. Working with my router expert we were able to create access lists on the router based on the community string. With that access list, we could then apply route map policies to extend the AS_PATH for non local routes.
Our physical/virtual pathway through the environment still looks the same, but our AS_PATH for the red route path (from above) now looks like:
62000 62000 62000 62000 62100 65515 65520 65520 e
Making the region A2 path less preferable based on length.
Thanks to regional community tags being automatically added to traffic crossing inter-hub links, we were able to modify our route path length to prevent asymmetric routing but allow for redundant paths to be available in the event of a failure.