Arista (VMware) SD-WAN Deep Dive — Part 1: Components, Gateways, and the Three Planes

Series map. This is Part 1 of five.

  1. Components, Gateways, and the Three Planes (this post)
  2. Routing — Overlay, Underlay, BGP, and the Gateway as Route Reflector
  3. The Data Plane — VCMP, DMPO, and Per-Flow Steering
  4. Topology Walkthroughs — MPLS-only meets Internet-only Across Continents
  5. Best Practice, Failure Modes, and a Design Checklist

Why another SD-WAN post

Most SD-WAN write-ups stop at “it builds tunnels and steers traffic”. That’s true of every vendor and it explains nothing. The interesting question is where state lives, who makes the routing decision, and which device the bytes actually traverse when a site in Manchester needs to talk to a site in Shenzhen — over an underlay neither of them shares end-to-end.

Arista’s SD-WAN (the product VMware sold as VeloCloud, which Broadcom divested to Arista in 2024) is unusually clean to reason about once you separate the three planes and stop conflating the Gateway with the Orchestrator. They are very different boxes doing very different things. The rest of this series leans on that separation hard, so we’ll spend Part 1 getting it right.

A note on the name

The product has had three flags in five years.

EraProduct nameOwner
2017 – 2019VeloCloud SD-WANVeloCloud (independent, then VMware)
2019 – 2023VMware SD-WANVMware (Dell-owned, then Broadcom)
2024 –Arista SD-WAN (still branded VeloCloud in the UI)Arista Networks

The components keep their old initials in logs, configs, and most field documentation:

  • VCE — VeloCloud Edge (the branch / hub appliance)
  • VCG — VeloCloud Gateway (the cloud-hosted gateway)
  • VCO — VeloCloud Orchestrator (the management plane)
  • VCMP — VeloCloud Multi-path Protocol (the data-plane encapsulation)
  • DMPO — Dynamic Multi-Path Optimisation (the link-quality and remediation engine)

You will see all five in this series. I’ll use the new product name “Arista SD-WAN” when talking about the platform and the VC-prefixed names when talking about the components, because the components haven’t really changed.

The components

Five things make the system work. We need all five before we can talk about anything else.

1. Edge — VCE

The VCE is the box at the branch. Physical or virtual. It does all four of the following at the same time:

  • Terminates underlay circuits (MPLS, broadband, 4G/5G, satellite — anything that produces a public or private next hop).
  • Originates and terminates VCMP tunnels to every peer it has a path to (other VCEs, Cloud Gateways, Partner Gateways, Hub VCEs).
  • Runs the DMPO engine that continuously measures every VCMP path it owns.
  • Enforces Business Policy on every flow — classification, steering, security service insertion.

Crucially the VCE is the only device in the system that ever holds user traffic in cleartext form on the LAN side. Everywhere else it travels encapsulated inside VCMP. That has consequences for where you insert security (Part 5).

2. Cloud Gateway — VCG

The Cloud Gateway is the part most people get wrong. It is not a controller in the SDN sense, and it is not a SASE PoP in the Zscaler sense, although it borrows ideas from both.

A Cloud Gateway is a multi-tenant VM that Arista (or, historically, VMware, or in some markets a service-provider partner) hosts in a cloud region. Every VCE in a customer’s overlay opens at least one VCMP tunnel to its assigned set of Cloud Gateways — usually two for redundancy, sometimes more. Once those tunnels are up, the Gateway plays three roles simultaneously:

  • Overlay control plane. It learns every prefix every VCE in the customer’s tenant advertises and re-distributes them — think route reflector for the overlay. We unpack this in Part 2.
  • Data-plane relay for any traffic that cannot or should not go direct VCE-to-VCE. This is the Cloud VPN via Gateway mode. We unpack the alternatives (Direct, Hub) in Part 2.
  • Cloud breakout point for SaaS, IaaS, and generic internet egress when the branch doesn’t have local DIA, or when policy says to backhaul.

A Cloud Gateway does not sit on anyone’s MPLS. It has Internet underlay reachability only, from one or more IaaS regions. That single sentence is the reason Partner Gateways exist.

3. Partner Gateway

A Partner Gateway is functionally a Cloud Gateway with two differences that matter:

  • It is operator-deployed and customer-dedicated (or at least operator-tenant-dedicated, in MSP setups). The operator runs it on hardware or VMs they control, usually inside their MPLS PoP.
  • It speaks BGP to the underlay — typically to the operator’s PE routers. This means MPLS-only sites can reach the Partner Gateway over MPLS (because the Partner Gateway is, from the MPLS network’s point of view, just another customer CE / VRF endpoint), and Internet-only sites can reach it over the public Internet (because it also has a public-facing interface).

That dual-stance — BGP into the MPLS underlay on one side, public IP on the other — makes the Partner Gateway the only device in this architecture that can bridge an MPLS-only branch and an Internet-only branch. We will burn an entire post (Part 4) on exactly that flow for the UK ISP example.

A Partner Gateway is not part of the cleartext data path for the LAN. Traffic between two VCEs that traverses a Partner Gateway is still inside VCMP end-to-end; the Partner Gateway just decapsulates from one VCMP tunnel and re-encapsulates into the next. The Edges do not see the Partner Gateway as a next hop; they see each other.

4. Orchestrator — VCO

The Orchestrator is the management plane. It is the only component a human logs into. It:

  • Holds the source-of-truth configuration for every Edge in the tenant.
  • Pushes configuration to the VCEs over a separate, out-of-band-ish HTTPS channel (not VCMP).
  • Receives telemetry — flow records, link-quality histories, alarms — and renders the dashboards.
  • Performs no real-time routing decisions and forwards no user data.

This is the bit people get wrong most often. If the Orchestrator falls over at 03:00, nothing in your data plane stops. VCEs continue to forward, DMPO continues to measure, Gateways continue to relay, the overlay continues to converge against itself. You lose the ability to push new config and the ability to look at graphs. That’s all.

The architectural reason that’s true: the Orchestrator does not participate in route exchange between branches. The Gateway does. Which brings us to —

5. Controller

In Arista SD-WAN there is no separate Controller appliance. The control-plane function is collocated on the Cloud Gateway (and on the Partner Gateway, for sites homed there). When the literature says “Gateway”, it means the data-plane relay role and the control-plane route-reflector role together. They run in the same process on the same VM, but they are independent enough that you should mentally separate them — especially in Part 2 when we discuss what happens when an Edge has two Gateways and only one of them sees a particular route.

If you came from a Cisco SD-WAN (Viptela) background and you are looking for the vSmart equivalent: it’s a function on the VCG, not a separate device. There is no OMP per se — Arista uses its own overlay routing exchange — but the role analogy holds.

Hub VCE — a special case

Worth flagging because it confuses people: any VCE can be designated as a Hub in the configuration. A Hub VCE is still a VCE — same hardware, same software, same VCMP — but it is the terminus for Cloud VPN via Hub mode, i.e., spokes that are configured to send branch-to-branch traffic through a designated branch rather than via the Gateway or directly. Hubs are how customers do data-centre-centric architectures without paying for a Partner Gateway. Hubs are not Gateways. They do not exchange overlay routes with other tenants and they do not have BGP-to-MPLS-PE responsibilities. They are just VCEs with a star drawn around them on the topology map.

The three planes

With the components defined, the planes fall out cleanly.

Management plane

  • Lives on: Orchestrator.
  • Carries: config push, telemetry pull, alarms, admin login.
  • Talks to: Edges, Gateways, Partner Gateways — all over HTTPS, none over VCMP.
  • Failure mode: lose it and you lose visibility and the ability to change config. Data plane keeps running.

Control plane

  • Lives on: Cloud Gateway and Partner Gateway.
  • Carries: overlay route advertisements between Edges, peer discovery, path-quality summary digests, security policy distribution.
  • Talks to: Edges, over a dedicated control channel inside the VCMP tunnel to the Gateway. Same tunnel as the data plane physically, logically separate.
  • Failure mode: lose every Gateway an Edge is homed to and that Edge eventually stops learning new routes — but established branch-to-branch tunnels keep forwarding for as long as their last-known routes are valid and DMPO keeps the underlay measurements current. This is the part most operators underestimate when sizing redundancy.

Data plane

  • Lives on: Edges, Cloud Gateways, Partner Gateways, Hub VCEs.
  • Carries: the customer’s actual packets, encapsulated in VCMP.
  • Talks to: every node it has a route to and a working underlay path to, over UDP/2426 by default.
  • Failure mode: path-level remediation via DMPO; if every path to a destination fails, the flow blackholes — same as any router.

The thing to internalise is that the control and data planes share VCMP tunnels but are independent protocols inside them. That’s why a Gateway-relayed flow goes Edge → Gateway → Edge: the Gateway tells both Edges what to do (control), then it forwards their packets (data). And it’s why a “Direct” branch-to-branch flow can have its data plane bypass the Gateway entirely while its control plane still depends on the Gateway being alive.

The UK ISP scenario, introduced

We’ll keep coming back to one concrete example. Let me set it up now and we’ll use the same vocabulary all the way through.

The operator: a UK national ISP. Let’s call them BritNet. They sell MPLS in the UK, partner-deliver MPLS internationally, and sell broadband and dedicated Internet access across the UK. They’ve stood up an Arista SD-WAN service for their enterprise customers.

Gateway footprint: two Cloud Gateways, both in the UK. One in a Slough data centre, one in a Manchester data centre. Both have public IPv4 reachability over the BritNet AS and over Internet transit. Both are pure Cloud Gateways — Internet underlay only.

A representative customer, “GlobalCo”, with the following sites:

SiteLocationUnderlay 1Underlay 2
HQLondonMPLS (BritNet)DIA (BritNet)
Bristol officeBristolMPLS (BritNet)Broadband
Newcastle officeNewcastleBroadband only
New York officeNew YorkMPLS (BritNet international partner)DIA (local US ISP)
Chicago officeChicagoMPLS only (no Internet drop)
Shanghai officeShanghaiDIA only (no MPLS)

Six sites, three underlay shapes, three continents, two UK-only Gateways. Every interesting question this series wants to answer is hiding in here:

  • How does Newcastle (Internet only) reach HQ (MPLS + Internet)? Which underlay does it pick? Who decides?
  • How does Chicago (MPLS only) reach a UK Cloud Gateway when the Gateway has no MPLS interface?
  • How does Chicago (MPLS only) reach Shanghai (Internet only) when neither of them shares an underlay with the other and the Gateways can’t bridge them?
  • What changes if BritNet stands up a Partner Gateway instead of (or alongside) the Cloud Gateways?
  • Where do Business Policy, DMPO, and route preference each contribute to those answers?

Hold those questions. Part 2 answers the first one and lays the foundation for the rest.

What’s next

Part 2 is where this gets technical: how routes get into the overlay, how the Gateway distributes them, how the Edge picks an underlay, and how BGP at the Partner Gateway turns “MPLS-only branch” into a solved problem.

If anything in this post landed wrong — in particular, the claim that the Orchestrator is not in the control plane is the one that people argue with — push back. The rest of the series builds on it.