SDWAN Resilience Part 3: DC to DCE Routing — Static, OSPF, and BGP
Part 2 left us with a working BGP overlay between hubs and spokes. The hubs know how to reach every spoke; the spokes know how to reach the hubs. Neither end yet knows how to reach the actual application stack, which lives in the DCE — a separate AS (65500) that each DC peers with independently, with no DCI between the DCs.
This post is about the routing relationship between each hub FortiGate and the DCE, the three protocols you can choose, and which one survives the failure modes the no-DCI design exposes.
What the hub FortiGate is actually being asked to do
Each hub has two distinct routing roles:
- Downstream: BGP to spokes over the IPsec overlay. AS 65000 ↔ AS 65100s. Driven entirely by Part 2.
- Upstream: routing relationship with the DCE on a physical (or VLAN) interface. AS 65000 ↔ AS 65500 if BGP, or some IGP / static if not.
The hub’s job is to advertise spoke prefixes into DCE so the application stack can return traffic, and to advertise DCE prefixes into the spoke overlay so spokes know how to reach the apps.
The relationship is more delicate than a normal redistribution because of two constraints from Part 1:
- No DCI. HUB-1 and HUB-2 do not share a control plane. Each hub builds its own view independently.
- Active/standby. In steady state we want all spoke→DCE traffic via DC1 and all DCE→spoke return via DC1. Both directions must agree.
The protocol you pick on the upstream side has to deliver three things:
- Withdraw DCE prefixes from the spoke overlay when the DCE peering goes down on this hub (so spokes converge to the other hub).
- Express the active/standby preference in a way the DCE will honour for return traffic.
- Converge fast enough that the in-flight session impact stays within the application’s tolerance.
With those three tests in mind, here are the three options.
Option A: Static
The simplest possible upstream. The hub has a static default (or specific aggregates) toward the DCE next-hop, and a static (or set of statics) on the DCE side back toward the spoke summary.
config router static
edit 100
set dst 10.100.0.0 255.255.0.0 # DCE service prefix
set gateway 10.0.0.1 # DCE-side router on the DC interconnect
set device "port2"
next
end
config router prefix-list
edit "dce-from-static"
config rule
edit 1
set prefix 10.100.0.0 255.255.0.0
set ge 16
set le 24
next
end
next
end
config router route-map
edit "redist-dce-to-bgp"
config rule
edit 1
set match-ip-address "dce-from-static"
next
end
next
end
config router bgp
config redistribute "static"
set status enable
set route-map "redist-dce-to-bgp"
end
end
Pros
- Trivially predictable. The route is there or it is not.
- Zero protocol overhead, zero adjacency to chase down.
- Easy to filter — there’s a finite list of statics.
Cons
- No failure detection beyond link state. If the DCE next-hop is unreachable but the interface is still up (transit switch failure between hub and DCE, asymmetric VLAN mis-trunk, etc), the hub keeps advertising the route. This is the failure mode that breaks active/standby cleanly.
- DCE-side return path is not dynamic either. The DCE has to be told manually that DC1 is preferred over DC2. Any change requires an out-of-band update on a device the hub team probably doesn’t operate.
- Adding or removing a DCE prefix is a change ticket on every hub.
The fix for the failure-detection gap is to make the static SLA-tied. FortiOS lets you tie a static route to a Performance SLA and pull it from the RIB when the SLA fails. We’ll use that pattern in Part 5 — but if you find yourself reaching for it here, you’ve reinvented enough of a routing protocol that you might as well run one.
Verdict: viable only if (a) DCE prefixes are stable, (b) the path between the hub and DCE has no failure modes that don’t take the interface with them, and (c) you accept manually managing the DCE-side return-path preference. In a real dual-DC design, that combination is rare.
Option B: OSPF
Treat the DCE as a routing extension and run OSPF with the DCE-side router on a backbone (or a stub) area. The hub redistributes the DCE-learned prefixes into the spoke-side BGP via a route-map.
config router ospf
set router-id 10.255.0.1
set redistribute connected
config area
edit 0.0.0.0
next
end
config network
edit 1
set prefix 10.0.0.0 255.255.255.252 # DC1 ↔ DCE p2p
set area 0.0.0.0
next
end
config interface
edit "port2"
set hello-interval 1
set dead-interval 4
set network-type point-to-point
set bfd enable
next
end
end
config router prefix-list
edit "dce-from-ospf"
config rule
edit 1
set prefix 10.100.0.0 255.255.0.0
set ge 16
set le 24
next
end
next
end
config router route-map
edit "redist-ospf-to-bgp"
config rule
edit 1
set match-ip-address "dce-from-ospf"
next
end
next
end
config router bgp
config redistribute "ospf"
set status enable
set route-map "redist-ospf-to-bgp"
end
end
hello-interval 1 / dead-interval 4 is not the OSPF default — it’s tuned for fast convergence in line with Part 4. bfd enable on the interface sub-stanza adds BFD for OSPF, which we’ll discuss in Part 4 as well.
Pros
- Dynamic: DCE prefix changes propagate without hub reconfiguration.
- Fast neighbour-down detection with tuned timers and BFD.
- Loop-free by construction (SPF), so redistribution boundaries are clean.
Cons
- OSPF floods LSAs. If the DCE has a busy IGP, you’ve just imported its churn.
- AS-level isolation is muddier — OSPF is a single trust domain, and security/operational separation between hubs and DCE is now via prefix-list, not a different protocol.
- Expressing the active/standby preference requires either OSPF cost manipulation (which the DCE side has to honour back toward the spokes) or pushing the preference into BGP with a route-map on redistribution.
- OSPF doesn’t carry communities. If the DCE wants to tag prefixes for policy (e.g., “this is internet break-out, prefer DC1 even harder”), you can’t do it in OSPF.
Failure-mode test (DCE peering up but DCE unreachable): OSPF on the hub stops receiving the DCE prefixes once the dead-interval expires or BFD declares the neighbour down, the prefixes drop out of the OSPF RIB, redistribution into BGP withdraws them, and spokes converge to HUB-2. Passes the test, with timing dictated by hello/dead/BFD.
Verdict: a solid choice when you and the DCE team trust each other enough to share an IGP. Most enterprises end up here only if they were already running OSPF inside the DCE.
Option C: eBGP
The DCE is a different AS (65500), so eBGP between each hub and the DCE is the protocol-correct answer. Different AS, clear policy boundary, full BGP attribute toolkit available.
config router bgp
set as 65000
set router-id 10.255.0.1
set keepalive-timer 3
set holdtime-timer 9
config neighbor
edit "10.0.0.1"
set remote-as 65500
set bfd enable
set capability-graceful-restart enable
set route-map-in "from-dce"
set route-map-out "to-dce"
set send-community standard
next
end
end
config router route-map
edit "from-dce"
config rule
edit 1
set match-community "dce-services"
set set-local-preference 200
next
end
next
edit "to-dce"
config rule
edit 1
set match-ip-address "spokes-summary"
set set-community "65500:100" # DC1: prefer
next
end
next
end
config router community-list
edit "dce-services"
config rule
edit 1
set action permit
set regexp "65500:200"
next
end
next
end
keepalive 3 / holdtime 9 is the FortiOS minimum holdtime that’s safe — Part 4 explains the math. bfd enable again pulls in BFD-for-BGP.
The DCE side is the symmetric mirror, with the active/standby preference encoded by community: HUB-1 tags spoke advertisements with 65500:100 (DC1, prefer), HUB-2 with 65500:200 (DC2, secondary). On the DCE side a route-map matches those communities and sets local-preference accordingly. DC1 wins for return traffic; if HUB-1 withdraws, DC2 takes over because it’s the only remaining path.
Pros
- Different AS, clean policy boundary. Filtering by AS-path is robust against accidental redistribution.
- Communities give the DCE team a cooperative way to influence preference without coordinating route-maps each side.
- BGP supports BFD natively (so does FortiOS OSPF, but BGP’s policy hooks are richer).
- Withdraw-on-failure is the protocol’s default behaviour — no extra plumbing to make it work.
- Aligns with Fortinet’s SD-WAN Architecture for Enterprise recommendation: when crossing AS boundaries, run BGP.
Cons
- More configuration. Route-maps, community-lists, prefix-lists, and they have to be agreed with the DCE team.
- Slightly slower default convergence than OSPF if you don’t tune the timers — fixed in Part 4.
- AS-path loop prevention can bite when multi-homing if you’re not careful with
set allowas-in. Don’t enable it unless you’ve decided you actually want the loop.
Failure-mode test (DCE peering up but DCE unreachable): BFD declares the DCE neighbour down, the eBGP session drops, prefixes are withdrawn, redistribution-equivalent path (it’s already in BGP, no redistribution required) immediately reflects the withdrawal to spokes via the spoke-side BGP. Spokes converge to HUB-2. Passes the test, and unlike OSPF the same protocol carries the policy decisions, so tuning is in one place.
Verdict: in this design, this is the answer.
Pros/cons at a glance
| Property | Static | OSPF | eBGP |
|---|---|---|---|
| Detects DCE-side failure | No (interface only) | Yes (with BFD) | Yes (with BFD) |
| DCE-side preference | Manual | Cost-based | Community-based |
| Carries communities | n/a | No | Yes |
| Survives DCE prefix churn | Manual update | Auto | Auto |
| AS / policy boundary | Implicit | Implicit | Explicit |
| Convergence (tuned) | Link-state | ~1 s (BFD) | ~1 s (BFD) |
| Operational complexity | Low | Medium | Medium-High |
| Fortinet BP recommendation for AS boundary | No | No | Yes |
The redistribution/glue layer
Whichever upstream protocol you pick, the hub still has to glue what it learns upstream into what it sends downstream (and vice versa). With BGP both ways, this is one protocol with route-maps; with OSPF or static, it is redistribution.
A few rules to keep the glue safe:
- Never redistribute everything. Always use a route-map with a prefix-list match. The default-deny stance prevents a DCE prefix leak from dragging unrelated routes into the spoke overlay.
- Tag at ingress. Add a community at the hub-to-DCE ingress route-map (
from-dce) so downstream policy on spokes (or the other hub, if you ever add iBGP) can match on it. - Strip
next-hopon redistribution from non-BGP. OSPF redistribution into BGP withoutnext-hop-selfsomewhere downstream will land spokes with the OSPF neighbour as next-hop, which spokes can’t resolve. - Cap prefix-count.
set maximum-prefixon every BGP neighbour, both sides. Limits blast radius from a misconfigured prefix-list.
The recommendation
Run eBGP between each hub and the DCE, with BFD, communities for active/standby preference, and prefix-list-bounded route-maps in both directions. Static is a poor fit because it can’t withdraw on a soft failure. OSPF works but doesn’t earn its keep when there’s a clean AS boundary already.
That recommendation is consistent with Fortinet’s published architecture guidance and tracks the convergence and failure-detection requirements set out in Part 1.
Failback behaviour and what to watch
When DC1 recovers after a failover to DC2:
- DCE relearns routes from HUB-1 with the higher local-preference (set by community) and switches return traffic back.
- Spokes see HUB-1 advertise DCE prefixes again and prefer them via the local-preference set on the spoke-side route-map (Part 2).
The two switches are independent and can race. In practice, it doesn’t matter — the underlying transports are both up before either BGP session has fully reconverged, so the only flows in flight are TCP, which retransmits. Where it does matter is for stateful flows that pin source IP — the active/standby choice in Part 1 was specifically to make sure these don’t move until there is a real DC failure.
If you want to dampen flap-induced failback you can set BGP route-flap dampening on the DCE-side neighbours, or add a hold timer on the spokes that stops them switching back to HUB-1 for N minutes after a failure. The simpler and more honest fix is to make the underlying paths reliable.
Where Part 4 picks up
We’ve now got hub↔spoke (Part 2) and hub↔DCE (this post) running cleanly in steady state. Both rely heavily on phrases like “with BFD” and “with tuned timers”. Part 4 is the timer-math part: DPD vs BFD on tunnels, BFD-for-BGP, hold-down/keepalive ratios, and what convergence numbers each combination actually delivers.