MP-BGP and VRFs on FortiGate SD-WAN
When an SD-WAN fabric grows past a single tenant or single use-case, flat routing tables stop being safe. Management traffic should never share a RIB with customer production traffic. Guest Wi-Fi has no business being anywhere near the corporate overlay. The cleanest answer on FortiGate is VRFs on the data plane plus MP-BGP (VPNv4) on the control plane — the same pattern you’d see on a service-provider PE, scaled down to a branch box.
This post walks through a reference design with three customer VRFs riding over a single overlay, the FortiOS configuration for each, the traffic flows in detail, and the gotchas that show up the first time you build it.
The reference design
Three customer VRFs, one transport VRF, one overlay:
| VRF | Purpose | Egress |
|---|---|---|
| VRF 0 | Transport / underlay (WAN, IPsec tunnels, BGP loopback) | Internet, MPLS |
| VRF 20 | Out-of-band management (FortiManager, FortiAnalyzer, SNMP, syslog) | Hub → NMS subnet |
| VRF 30 | Customer SD-WAN — LAN users, servers, voice | Hub / data centre |
| VRF 99 | Guest Wi-Fi | DIA via VRF 0 (route-leak + NAT) |
The overlay is built once in VRF 0 (ADVPN-style IPsec tunnels between branch and hub), and a single iBGP session with address-family vpnv4 unicast carries every customer VRF’s prefixes across it. Each VRF has its own route-distinguisher (RD) and route-target (RT) so the hub can keep them in separate RIBs.
MP-BGP on FortiOS — the short version
FortiOS (7.0 and later) implements the MPLS/VPN control plane minus the MPLS data plane: VPNv4 prefixes are exchanged as RD:prefix with extended community RTs, and the data-plane encapsulation is the IPsec overlay rather than an MPLS LSP. Conceptually:
- The iBGP session lives in VRF 0 between loopbacks, sourced over the overlay.
- The
config router bgp→config vrfblock on each side defines per-VRF RD and import/export RTs. - Prefixes redistributed (or
network-statement’d) into a VRF’s BGP table are tagged with the VRF’s export-RT and advertised over the VPNv4 session. - The receiving side imports any VPNv4 prefix whose RT matches one of its
import-rtvalues into the matching VRF.
Per-VRF isolation is therefore an RT-policy decision, not a topology decision. You can leak between VRFs by importing each other’s RTs — useful for shared-services VRFs, dangerous if done by accident.
Configuration walkthrough
1. Underlay and overlay (VRF 0)
WAN interfaces, loopback, and IPsec tunnels stay in VRF 0. Nothing surprising here:
config system interface
edit "wan1"
set vrf 0
set role wan
next
edit "lo-bgp"
set vrf 0
set type loopback
set ip 10.255.0.11 255.255.255.255
next
edit "advpn-hub"
set vrf 0
set ip 10.200.0.11 255.255.255.255
set remote-ip 10.200.0.1 255.255.255.255
set interface "wan1"
set type tunnel
next
end
2. Define the customer VRFs on interfaces
config system interface
edit "mgmt"
set vrf 20
set ip 172.20.10.1 255.255.255.0
set allowaccess ping ssh https
next
edit "lan-vlan30"
set vrf 30
set ip 10.30.10.1 255.255.255.0
next
edit "guest-vlan99"
set vrf 99
set ip 192.168.99.1 255.255.255.0
next
end
3. iBGP session and per-VRF RD/RT
config router bgp
set as 65000
set router-id 10.255.0.11
set ibgp-multipath enable
config neighbor
edit "10.255.0.1"
set remote-as 65000
set update-source "lo-bgp"
set ebgp-enforce-multihop enable
set soft-reconfiguration enable
set capability-graceful-restart enable
set additional-path receive
config additional-path
end
set advertisement-interval 1
# Critical — turn on VPNv4
set address-family vpnv4
next
end
config vrf
edit "20"
set role ce
set rd "65000:20"
set export-rt "65000:20"
set import-rt "65000:20"
next
edit "30"
set role ce
set rd "65000:30"
set export-rt "65000:30"
set import-rt "65000:30"
next
edit "99"
set role ce
set rd "65000:99"
set export-rt "65000:99"
# Note: no shared-services import here — VRF 99 is intentionally isolated
set import-rt "65000:99"
next
end
# Per-VRF networks/redistribution
config network
edit 1
set prefix 172.20.10.0 255.255.255.0
set vrf 20
next
edit 2
set prefix 10.30.10.0 255.255.255.0
set vrf 30
next
end
end
The hub mirrors this with the same RT scheme. Because each VRF has a distinct RD, the same overlapping IP space (e.g. 10.30.10.0/24) could exist in another customer VRF without collision in the BGP table.
4. Guest VRF — DIA via route leaking
Guest traffic must not ride the overlay. Instead, it breaks out locally through wan1 (VRF 0). Two pieces:
(a) Default route in VRF 99 pointing into VRF 0:
config router static
edit 10
set dst 0.0.0.0 0.0.0.0
set gateway 203.0.113.1 ; the actual ISP next-hop in VRF 0
set device "wan1"
set vrf 99 ; install in VRF 99...
set dstvrf 0 ; ...but resolve next-hop in VRF 0
next
end
(b) Firewall policy from guest-vlan99 (VRF 99) to wan1 (VRF 0) with NAT enabled:
config firewall policy
edit 100
set name "guest-dia"
set srcintf "guest-vlan99"
set dstintf "wan1"
set srcaddr "guest-net" ; 192.168.99.0/24
set dstaddr "all"
set action accept
set service "DNS" "HTTP" "HTTPS" "PING"
set schedule "always"
set nat enable
set utm-status enable
set ssl-ssh-profile "certificate-inspection"
set webfilter-profile "guest-webfilter"
set logtraffic all
next
end
NAT is mandatory here. Without source-NAT the return traffic comes back to a public destination (wan1 IP) but the kernel would attempt to deliver to 192.168.99.x in VRF 99 — which the internet has no route to. NAT collapses the conversation onto the WAN IP and pins the reverse path back into VRF 99 via session state.
5. Hub side — keeping VRFs apart
On the hub the same config vrf block applies, plus the upstream connections to FortiManager (VRF 20) and the data-centre core (VRF 30). The interesting bit is what you don’t do: no policy permits VRF 30 ↔ VRF 20 unless you have a deliberate management-from-customer requirement, and there’s no inter-VRF static between the customer VRFs and VRF 99.
Traffic flows
Flow A — Management (VRF 20): branch FortiGate to FortiAnalyzer
Branch FGT (VRF 20 src-IP)
→ mgmt loopback in VRF 20
→ BGP best-path lookup in VRF 20 RIB
→ next-hop = hub loopback (10.255.0.1) in VRF 0
→ recursive lookup: VRF 0 says "out advpn-hub"
→ IPsec encap on wan1
→ ISP → hub wan1 → IPsec decap
→ hub VRF 0 → MP-BGP RT 65000:20 → installed in hub VRF 20
→ out hub mgmt-zone interface (VRF 20)
→ FortiAnalyzer
Two routing lookups: the customer-VRF lookup (172.20.10.0/24 is reachable via BGP next-hop 10.255.0.1) and the underlay lookup (10.255.0.1 is reachable via the IPsec tunnel). This is the classic PE-PE forwarding model — minus the MPLS label, because the IPsec SA itself carries the VRF context implicitly (one SA, many VRFs, demuxed by destination IP after decap).
Important: config log fortianalyzer setting must pin the source interface and IP into VRF 20, otherwise FortiOS will source from VRF 0 and the session will fail or worse, leak telemetry over the wrong path:
config log fortianalyzer setting
set status enable
set server "172.20.99.10"
set source-ip "172.20.10.1"
set interface-select-method specify
set interface "mgmt"
end
Flow B — Customer SD-WAN (VRF 30): user to data-centre application
User (10.30.10.50)
→ branch lan-vlan30 (VRF 30)
→ firewall policy VRF30 → SDWAN zone
→ SD-WAN rule: "App=Salesforce → prefer overlay-1, SLA <80ms"
→ BGP best-path in VRF 30 RIB → next-hop hub loopback
→ recursive: VRF 0 → IPsec tunnel
→ hub decap → VRF 0 → import RT 65000:30 → VRF 30 RIB
→ hub firewall policy VRF30 → DC-core
→ application
SD-WAN steering on FortiOS happens before the BGP lookup — the SD-WAN rule selects a zone (a set of overlay tunnels), and BGP best-path is then constrained to next-hops reachable through that zone. That means your performance SLAs (config system sdwan → health-check) live in VRF 0 (because the tunnels do), but the result is applied to a VRF 30 forwarding decision. This is the most-common point of confusion when troubleshooting.
Flow C — Guest Wi-Fi (VRF 99) → DIA
Guest device (192.168.99.42)
→ branch guest-vlan99 (VRF 99)
→ firewall policy VRF99 → wan1 (NAT)
→ static route: 0.0.0.0/0 vrf 99 dstvrf 0 nh 203.0.113.1
→ recursive lookup in VRF 0 → wan1
→ SNAT to wan1 public IP, session table records (VRF99 → VRF0)
→ packet to internet
→ reply arrives on wan1
→ session lookup → reverse NAT → packet re-injected into VRF 99
→ back to guest
The guest’s traffic never enters VRF 30 or VRF 20, never traverses the overlay, and never appears in MP-BGP advertisements. From the corporate side the guest network simply doesn’t exist.
Gotchas
These are the ones that cost real time in production.
1. The BGP loopback must be in VRF 0. The iBGP session is between underlay reachable IPs. Putting the loopback in VRF 30 to “match the customer” will work for one VRF and break for everything else, because VPNv4 NLRI is exchanged on a single session.
2. set address-family vpnv4 is not on by default. A neighbour without it will only exchange ipv4 unicast from VRF 0. You’ll see your customer prefixes in the local VRFs but they’ll never reach the hub. Always confirm with get router info bgp summary and check the VPNv4 section.
3. RD uniqueness matters for ECMP. If two branches advertise the same prefix with the same RD, the hub’s BGP best-path runs early and you lose multipath. Best practice: encode the branch ID in the RD, e.g. RD = <ASN>:<branch-id*1000 + vrf-id>. RTs stay symmetric across branches; only the RD varies.
4. SD-WAN health-checks live in the underlay VRF. config system sdwan → config health-check will reach into VRF 0 by default because the SD-WAN members are tunnels in VRF 0. If you accidentally put a health-check probe destination inside a customer VRF, it’ll silently fail and your SLAs will report dead.
5. Inter-VRF static routes need a resolvable next-hop in the destination VRF. set dstvrf 0 doesn’t magically forward — it tells the FIB to resolve gateway in VRF 0’s RIB. If VRF 0 doesn’t have a route to that gateway, the static is inactive. get router info routing-table vrf 99 all is your friend.
6. NAT is non-negotiable for the guest DIA path. It’s tempting to think “I’ll just leak the default and let it route.” The internet has no route back to your guest range. SNAT to the WAN interface IP is the only thing that makes the return path work, and the FortiGate session table is what binds the reverse NAT back to VRF 99.
7. Don’t import the wrong RT on the hub. A common copy-paste error: a hub VRF imports both 65000:20 and 65000:30 because someone duplicated a config block. Now management prefixes appear in the customer VRF’s RIB, and one mis-scoped policy later you’ve got a guest device on the FortiManager subnet. Audit imports with get router info bgp vpnv4-unicast all and grep for each VRF.
8. Policy lookups are per-(srcintf, dstintf) and respect VRF. A policy from lan-vlan30 to wan1 will not match traffic from VRF 99 even if the addresses overlap, because srcintf belongs to a different VRF. This is a feature, not a bug — but it means inter-VRF flows always require explicit policy.
9. DNS for guests. If you push corporate DNS (in VRF 30) to the guest DHCP scope, you’ve just created a covert channel from VRF 99 into VRF 30. Push public resolvers (1.1.1.1, 9.9.9.9) or run a forwarder bound to the VRF 99 interface only.
10. FortiManager and FortiAnalyzer source-IP selection. By default FortiOS picks the egress interface IP via the routing table, which lands in VRF 0. Pin source-ip and interface-select-method specify for every management-plane service (FortiAnalyzer, FortiManager, syslog, SNMP, NTP, FortiGuard) or you’ll get partial-VRF management with intermittent failures.
11. Recursive routing loops via dstvrf. If VRF 0 ever points a prefix back into VRF 99 (for example, a misconfigured return route for the guest network), the leaked default in VRF 99 plus the leaked specific in VRF 0 forms a loop. Keep VRF 99 strictly egress-only — no routes leak from VRF 99 back into VRF 0.
12. Graceful-restart and additional-path on VPNv4. Both ends must agree. Mismatched capabilities cause silent re-flaps under maintenance windows. Set them explicitly on both sides and verify with diag ip router bgp show.
Verification commands worth memorising
# Per-VRF RIB
get router info routing-table vrf 30 all
get router info routing-table vrf 99 all
# Per-VRF BGP RIB
get router info bgp vrf 30 network
get router info bgp vrf 30 neighbors
# VPNv4 table — what's actually being exchanged
get router info bgp vpnv4-unicast all
get router info bgp vpnv4-unicast neighbors 10.255.0.1 advertised-routes
get router info bgp vpnv4-unicast neighbors 10.255.0.1 received-routes
# Session and NAT (debug guest DIA)
diagnose sys session filter src 192.168.99.42
diagnose sys session list
# Inter-VRF static resolution
diagnose ip rtcache list
get router info routing-table details 0.0.0.0
Closing
The pattern here — one overlay, one BGP session, many VRFs separated by RTs, with selective leaking for shared services and DIA — scales from a two-site lab to a multi-tenant managed-service deployment without changing shape. The hard parts aren’t the BGP commands; they’re the source-IP selection on management-plane services, the discipline around RT imports, and the NAT/route-leak boundary at the guest VRF. Get those three right and the rest is mechanical.
If you’re rolling this out for the first time, build VRF 0 + VRF 30 first, prove the overlay end-to-end, then add VRF 20 and VRF 99 incrementally. Adding all three on day one and chasing a broken topology across three RIBs is a bad afternoon.