Arista (VMware) SD-WAN Deep Dive — Part 2: Routing — Overlay, Underlay, BGP, and the Gateway as Route Reflector
Series map. Part 2 of five.
- Components, Gateways, and the Three Planes
- Routing — Overlay, Underlay, BGP, and the Gateway as Route Reflector (this post)
- The Data Plane — VCMP, DMPO, and Per-Flow Steering
- Topology Walkthroughs — MPLS-only meets Internet-only Across Continents
- Best Practice, Failure Modes, and a Design Checklist
Part 1 set up the components. This post is about how an Edge in Bristol learns the prefix sitting behind an Edge in Shanghai, and how it decides whether to send the packet over MPLS, broadband, or via the Cloud Gateway.
There is more going on here than people credit. Three things have to happen in order:
- Each Edge advertises its local reachability somewhere.
- A central node redistributes those advertisements out to everyone in the tenant.
- Each Edge does its own route selection across overlay routes, underlay routes, and its policy stance for that prefix.
Step 2 is the Gateway. Step 1 and step 3 are the Edge. The Orchestrator is nowhere in this picture in real time — it pushes the rules of the game once at config time and then steps back. Keep that in your head for the rest of the post.
Where overlay prefixes come from
Three sources, in order of how often they’re used in the field.
Static LAN configuration
The simplest case. You define a LAN-side subnet on the Edge — say 10.20.30.0/24 on Bristol’s LAN interface — and the Edge automatically advertises it into the overlay. No protocol on the LAN side at all. This is the right answer for small offices where the Edge is the LAN gateway.
OSPF or BGP from the LAN
For sites with internal routing — typical at a head office or data centre where the Edge is one of several L3 hops — you turn on OSPF or eBGP between the Edge and the LAN-side router. The Edge learns the local prefixes from the LAN routing protocol and redistributes them into the overlay. This is a redistribution exactly like you’d see between two routing protocols on any router. There are knobs for which prefixes get redistributed (per-prefix filters, tag mapping) and what they look like once they’re in.
Connected, from the WAN
When the WAN is MPLS and the Edge is peering BGP with the SP’s PE, the Edge learns prefixes for other branches on the same MPLS VRF from the PE — underlay routes. These don’t automatically become overlay advertisements, and you usually don’t want them to. The overlay has its own copy of those branches’ prefixes coming from the other Edges, learned via the Gateway. Letting underlay learned prefixes leak into the overlay is one of the classic foot-guns; we cover it in Part 5.
The Edge → Gateway advertisement
Once the Edge knows what it should advertise, it sends those prefixes up to every Cloud Gateway it has a VCMP tunnel to. The advertisement goes inside the control channel of the VCMP tunnel — same UDP/2426 conversation as the data plane, different sub-protocol. The Edge sends along, per prefix:
- The prefix and mask.
- The route type (connected, OSPF-learned, BGP-learned, static).
- The next-hop Edge identifier — i.e., “send packets for this prefix back to me”.
- Any tags picked up from the LAN protocol or set by policy.
- Metric values that the LAN protocol exposed (BGP local-pref, OSPF cost, etc.) — the Gateway carries these through.
The Gateway does not run a link-state computation or a path-vector loop-detection on these — there is no equivalent of OSPF SPF or BGP best-path selection happening on the Gateway. The Gateway’s job is transitive distribution, not arbitration. Each Edge does its own arbitration, locally, when it consumes the routes the Gateway hands it. That’s the route-reflector analogy from Part 1: the Gateway reflects; the Edges decide.
Gateway → other Edges: the redistribution
When the Gateway has the advertisement from Bristol, it pushes it down the control channel of the VCMP tunnels to every other Edge in the same tenant (filtered by segment / VRF if you have segmentation on, more on that in Part 5). Each of those Edges now has an overlay route:
10.20.30.0/24 via Bristol-Edge (overlay)
Three details matter here:
- Two Gateways means two paths. If the Edge in Newcastle is homed to both UK Gateways (Slough and Manchester), it will receive the Bristol advertisement from each Gateway. They’re the same route, just learned via two different reflectors. The Edge dedupes by next-hop-Edge.
- Gateways do not see each other. The two UK Gateways are not peered. Each receives every Edge’s advertisement independently and pushes it out independently. This is why Gateway redundancy is “both see everything”, not “active/standby with state sync”.
- The Gateway sends its own advertisements too. It tells the Edges where the internet is (default 0/0 via Gateway for sites that breakout via the Gateway) and where any directly-attached overlay services live.
Cloud VPN modes — Direct, Cloud Gateway, Hub
Once an Edge knows that 10.20.30.0/24 is reachable via Bristol-Edge, it needs to decide how to send the packet. The “how” is one of three modes, configured per Edge / per Profile and per destination context:
Cloud VPN via Cloud Gateway
The default and the simplest. The Edge encapsulates the packet into the VCMP tunnel that points at the Gateway, the Gateway decapsulates, looks up the next-hop Edge in its tenant table, and re-encapsulates into the VCMP tunnel that points at Bristol-Edge. Two VCMP hops, one Gateway in the middle.
This is the only mode that works when the two Edges cannot directly reach each other on any underlay — which, foreshadowing Part 4, is the entire reason Chicago and Shanghai can’t talk in our scenario without Partner Gateways. The Cloud Gateway has Internet only; Chicago has MPLS only; they have no shared underlay; the path needs both segments terminated on a node that has both underlays.
Cloud VPN Direct (Dynamic Branch-to-Branch)
When two Edges share a reachable underlay — i.e. they can both put a packet on the same IP plane and have it routed to each other — they can build a dynamic VCMP tunnel directly between themselves, bypassing the Gateway’s data plane entirely. The control plane still flows through the Gateway: the Edges learn of each other via the Gateway’s reflection, then the first flow between them triggers the direct tunnel setup. Subsequent flows use it.
Direct mode is preferred whenever it’s possible because it removes the Gateway from the data path — lower latency, lower Gateway cost, fewer choke points. Most “branch-to-branch” traffic on a healthy deployment lands here. The trade-off is that the Edge now has potentially a full mesh of dynamic tunnels (one to every other Edge that’s ever had a flow to it), which costs CPU and memory on the Edge.
There are knobs for when to build the tunnel (first packet, threshold of packets, never) and when to tear it down (idle timeout). We cover sizing in Part 5.
Cloud VPN via Hub
A designated VCE (Hub) acts as the relay instead of the Gateway. Functionally the same packet path — encapsulate to Hub, Hub decapsulates and re-encapsulates to destination Edge — but the relay is a customer device the customer owns, typically at a data centre. Used when:
- The customer’s DC sees most of the spoke traffic anyway (so the trombone is real, not artificial).
- The customer wants the relay on-net for compliance reasons rather than in a third-party cloud.
- The customer doesn’t have a Partner Gateway available and the Cloud Gateway is too far / wrong jurisdiction.
A Hub does not do BGP-into-MPLS the way a Partner Gateway does. If your Hub VCE is sitting in an MPLS-attached DC, the only thing the Hub sees on the MPLS side is whatever it has been told via static routes or LAN-side BGP from the DC’s own routers. It is not a PE-attached service in the way a Partner Gateway is.
How the Edge picks: overlay route selection
The Edge has, for a given destination prefix, potentially several candidate routes:
- Overlay route via Cloud Gateway (relayed) — always available if Gateway has the prefix.
- Overlay route via Direct branch tunnel — available if direct tunnel is up.
- Overlay route via Hub — available if Hub is configured for this destination class.
- Underlay route — e.g., the same prefix learned from the MPLS PE via BGP, because Bristol is also reachable on MPLS without any SD-WAN involvement.
By default the Edge prefers overlay over underlay for any prefix learned via the overlay, on the principle that SD-WAN exists in order to be on the data path. This is configurable per route (route preference) but the default is right for most deployments. Inside the overlay candidates, preference order is roughly:
- Direct (lowest data-plane cost).
- Hub (if explicitly preferred for this destination — common for DC-bound prefixes).
- Gateway (the catch-all).
Plus the obvious overlay-prefix selection within a class: longest prefix match first, then route type (connected > static > BGP > OSPF > overlay-learned), then metric.
A subtle one: when the Edge sees the same prefix from both Gateways, it doesn’t pick one and ignore the other. The Edge keeps both as equal-cost overlay paths via Gateway and DMPO chooses the better-performing Gateway at flow-establishment time. We’ll come back to this in Part 3.
BGP at the Edge
The Edge speaks BGP in two places.
LAN-side, with the internal network. This is straightforward eBGP or iBGP depending on the design. Used to learn local prefixes for redistribution into the overlay, and to advertise overlay-learned remote prefixes back into the LAN so internal routers know to send remote traffic to the Edge. Default-information-originate style: the Edge can announce a default to the LAN if the LAN should backhaul Internet via the Edge.
WAN-side, on MPLS underlay, with the SP’s PE. This is where the Edge participates in the MPLS VPN like any CE. It advertises its LAN prefixes (so the MPLS network knows how to reach them from other CEs on the same VRF) and learns other CEs’ prefixes. Do not let those underlay-learned prefixes leak into the overlay. The Edge has filters specifically to stop this.
The WAN-side BGP is also how the MPLS-attached Edge confirms that the underlay is functional — BGP session up means MPLS path is at least L3-clean.
BGP at the Partner Gateway
This is the lever that makes Partner Gateways what they are.
A Partner Gateway peers eBGP with the SP’s PE on its MPLS-facing interface. It is, from the MPLS network’s perspective, a CE belonging to the customer’s VRF. So:
- The Partner Gateway advertises to the MPLS network the customer’s overlay prefixes — i.e., every LAN prefix it has learned from any Edge in the customer’s tenant, including Edges that are nowhere near MPLS. From the MPLS network’s perspective, the Partner Gateway is a single CE that happens to have the entire customer fleet’s prefix space behind it.
- The Partner Gateway learns from the MPLS network the customer’s underlay prefixes — every other CE in the same VRF, including MPLS-only Edges like our Chicago site.
- Inside the SD-WAN overlay, the Partner Gateway re-advertises those underlay-learned prefixes as overlay routes via Partner-Gateway-as-next-hop. Every Edge in the tenant now has a way to reach Chicago — by sending packets through the Partner Gateway.
The mirror is true for the Internet-facing side: the Partner Gateway has a public IP and Edges with Internet underlay (Newcastle, Shanghai) can reach it directly. They build VCMP tunnels to it just as they would to a Cloud Gateway.
The result is the bridge: MPLS-only Edges reach the Partner Gateway over MPLS; Internet-only Edges reach the Partner Gateway over Internet; the Partner Gateway is the common decap/encap point. The Edges themselves never know the underlay split exists. They each have an overlay route via Partner Gateway and that’s the end of their concern.
Part 4 walks through the packet flow for this. It is, once you’ve internalised the role of BGP here, the cleanest example of the whole architecture.
Route preference cheat sheet
Worth pinning down because the documentation buries this in three different places:
| Where | What’s selected | By what |
|---|---|---|
| LAN router | overlay-learned remote prefix vs. its own underlay alternative | LAN BGP/OSPF metric + admin distance |
| Edge | overlay vs. underlay for same prefix | per-prefix route preference (default: prefer overlay) |
| Edge | among overlay candidates (Direct / Hub / Gateway) | Cloud VPN mode + per-destination policy |
| Edge | among overlay paths in the same class (e.g. two Gateways) | DMPO link quality at flow time |
| Edge | which underlay to put the VCMP tunnel on, for the chosen overlay path | Business Policy + DMPO |
| Gateway | reflects everything | (no selection — full redistribution) |
| Partner Gateway | what to redistribute MPLS → overlay and vice versa | BGP route maps + tenant filter |
That last row is where most large-tenant designs spend their time. Get it wrong and an MPLS prefix appears in two places in the overlay table — once via the Edge that lives on that MPLS, once via the Partner Gateway. Both are correct routes; the Edge will pick whichever has the better preference; asymmetric forwarding will follow. We’ll handle that in Part 5.
Segmentation — a teaser
The overlay supports segments (think VRFs end-to-end). Each segment has its own prefix table, its own Cloud VPN settings, and its own Business Policy. The Gateway and Partner Gateway distribute per-segment. The Edge keeps a separate forwarding table per segment. We’ll talk segmentation properly in Part 5 — for Parts 3 and 4 we assume a single default segment.
What’s next
We’ve now got the routing picture clear: Edges advertise, Gateways reflect, Edges decide. Direct preferred over Gateway preferred over Hub (or whatever ordering you’ve set). Partner Gateways carry the architecture across underlay boundaries by speaking BGP into MPLS on one side and accepting Internet VCMP on the other.
Part 3 takes the route-selection output — “send this flow into the VCMP tunnel that points at Bristol-Edge” — and digs into which underlay, how the packet is shaped, and what DMPO does when one of those underlays starts to brown out mid-flow.
That’s where the SD-WAN earns its keep.