SDWAN Resilience Part 3: DC to DCE Routing — Static, OSPF, and BGP

Part 2 left us with a working BGP overlay between hubs and spokes. The hubs know how to reach every spoke; the spokes know how to reach the hubs. Neither end yet knows how to reach the actual application stack, which lives in the DCE — a separate AS (65500) that each DC peers with independently, with no DCI between the DCs.

This post is about the routing relationship between each hub FortiGate and the DCE, the three protocols you can choose, and which one survives the failure modes the no-DCI design exposes.

What the hub FortiGate is actually being asked to do

Each hub has two distinct routing roles:

  1. Downstream: BGP to spokes over the IPsec overlay. AS 65000 ↔ AS 65100s. Driven entirely by Part 2.
  2. Upstream: routing relationship with the DCE on a physical (or VLAN) interface. AS 65000 ↔ AS 65500 if BGP, or some IGP / static if not.

The hub’s job is to advertise spoke prefixes into DCE so the application stack can return traffic, and to advertise DCE prefixes into the spoke overlay so spokes know how to reach the apps.

The relationship is more delicate than a normal redistribution because of two constraints from Part 1:

  • No DCI. HUB-1 and HUB-2 do not share a control plane. Each hub builds its own view independently.
  • Active/standby. In steady state we want all spoke→DCE traffic via DC1 and all DCE→spoke return via DC1. Both directions must agree.

The protocol you pick on the upstream side has to deliver three things:

  • Withdraw DCE prefixes from the spoke overlay when the DCE peering goes down on this hub (so spokes converge to the other hub).
  • Express the active/standby preference in a way the DCE will honour for return traffic.
  • Converge fast enough that the in-flight session impact stays within the application’s tolerance.

With those three tests in mind, here are the three options.

Option A: Static

The simplest possible upstream. The hub has a static default (or specific aggregates) toward the DCE next-hop, and a static (or set of statics) on the DCE side back toward the spoke summary.

config router static
    edit 100
        set dst 10.100.0.0 255.255.0.0     # DCE service prefix
        set gateway 10.0.0.1               # DCE-side router on the DC interconnect
        set device "port2"
    next
end

config router prefix-list
    edit "dce-from-static"
        config rule
            edit 1
                set prefix 10.100.0.0 255.255.0.0
                set ge 16
                set le 24
            next
        end
    next
end

config router route-map
    edit "redist-dce-to-bgp"
        config rule
            edit 1
                set match-ip-address "dce-from-static"
            next
        end
    next
end

config router bgp
    config redistribute "static"
        set status enable
        set route-map "redist-dce-to-bgp"
    end
end

Pros

  • Trivially predictable. The route is there or it is not.
  • Zero protocol overhead, zero adjacency to chase down.
  • Easy to filter — there’s a finite list of statics.

Cons

  • No failure detection beyond link state. If the DCE next-hop is unreachable but the interface is still up (transit switch failure between hub and DCE, asymmetric VLAN mis-trunk, etc), the hub keeps advertising the route. This is the failure mode that breaks active/standby cleanly.
  • DCE-side return path is not dynamic either. The DCE has to be told manually that DC1 is preferred over DC2. Any change requires an out-of-band update on a device the hub team probably doesn’t operate.
  • Adding or removing a DCE prefix is a change ticket on every hub.

The fix for the failure-detection gap is to make the static SLA-tied. FortiOS lets you tie a static route to a Performance SLA and pull it from the RIB when the SLA fails. We’ll use that pattern in Part 5 — but if you find yourself reaching for it here, you’ve reinvented enough of a routing protocol that you might as well run one.

Verdict: viable only if (a) DCE prefixes are stable, (b) the path between the hub and DCE has no failure modes that don’t take the interface with them, and (c) you accept manually managing the DCE-side return-path preference. In a real dual-DC design, that combination is rare.

Option B: OSPF

Treat the DCE as a routing extension and run OSPF with the DCE-side router on a backbone (or a stub) area. The hub redistributes the DCE-learned prefixes into the spoke-side BGP via a route-map.

config router ospf
    set router-id 10.255.0.1
    set redistribute connected
    config area
        edit 0.0.0.0
        next
    end
    config network
        edit 1
            set prefix 10.0.0.0 255.255.255.252    # DC1 ↔ DCE p2p
            set area 0.0.0.0
        next
    end
    config interface
        edit "port2"
            set hello-interval 1
            set dead-interval 4
            set network-type point-to-point
            set bfd enable
        next
    end
end

config router prefix-list
    edit "dce-from-ospf"
        config rule
            edit 1
                set prefix 10.100.0.0 255.255.0.0
                set ge 16
                set le 24
            next
        end
    next
end

config router route-map
    edit "redist-ospf-to-bgp"
        config rule
            edit 1
                set match-ip-address "dce-from-ospf"
            next
        end
    next
end

config router bgp
    config redistribute "ospf"
        set status enable
        set route-map "redist-ospf-to-bgp"
    end
end

hello-interval 1 / dead-interval 4 is not the OSPF default — it’s tuned for fast convergence in line with Part 4. bfd enable on the interface sub-stanza adds BFD for OSPF, which we’ll discuss in Part 4 as well.

Pros

  • Dynamic: DCE prefix changes propagate without hub reconfiguration.
  • Fast neighbour-down detection with tuned timers and BFD.
  • Loop-free by construction (SPF), so redistribution boundaries are clean.

Cons

  • OSPF floods LSAs. If the DCE has a busy IGP, you’ve just imported its churn.
  • AS-level isolation is muddier — OSPF is a single trust domain, and security/operational separation between hubs and DCE is now via prefix-list, not a different protocol.
  • Expressing the active/standby preference requires either OSPF cost manipulation (which the DCE side has to honour back toward the spokes) or pushing the preference into BGP with a route-map on redistribution.
  • OSPF doesn’t carry communities. If the DCE wants to tag prefixes for policy (e.g., “this is internet break-out, prefer DC1 even harder”), you can’t do it in OSPF.

Failure-mode test (DCE peering up but DCE unreachable): OSPF on the hub stops receiving the DCE prefixes once the dead-interval expires or BFD declares the neighbour down, the prefixes drop out of the OSPF RIB, redistribution into BGP withdraws them, and spokes converge to HUB-2. Passes the test, with timing dictated by hello/dead/BFD.

Verdict: a solid choice when you and the DCE team trust each other enough to share an IGP. Most enterprises end up here only if they were already running OSPF inside the DCE.

Option C: eBGP

The DCE is a different AS (65500), so eBGP between each hub and the DCE is the protocol-correct answer. Different AS, clear policy boundary, full BGP attribute toolkit available.

config router bgp
    set as 65000
    set router-id 10.255.0.1
    set keepalive-timer 3
    set holdtime-timer 9

    config neighbor
        edit "10.0.0.1"
            set remote-as 65500
            set bfd enable
            set capability-graceful-restart enable
            set route-map-in "from-dce"
            set route-map-out "to-dce"
            set send-community standard
        next
    end
end

config router route-map
    edit "from-dce"
        config rule
            edit 1
                set match-community "dce-services"
                set set-local-preference 200
            next
        end
    next
    edit "to-dce"
        config rule
            edit 1
                set match-ip-address "spokes-summary"
                set set-community "65500:100"     # DC1: prefer
            next
        end
    next
end

config router community-list
    edit "dce-services"
        config rule
            edit 1
                set action permit
                set regexp "65500:200"
            next
        end
    next
end

keepalive 3 / holdtime 9 is the FortiOS minimum holdtime that’s safe — Part 4 explains the math. bfd enable again pulls in BFD-for-BGP.

The DCE side is the symmetric mirror, with the active/standby preference encoded by community: HUB-1 tags spoke advertisements with 65500:100 (DC1, prefer), HUB-2 with 65500:200 (DC2, secondary). On the DCE side a route-map matches those communities and sets local-preference accordingly. DC1 wins for return traffic; if HUB-1 withdraws, DC2 takes over because it’s the only remaining path.

Pros

  • Different AS, clean policy boundary. Filtering by AS-path is robust against accidental redistribution.
  • Communities give the DCE team a cooperative way to influence preference without coordinating route-maps each side.
  • BGP supports BFD natively (so does FortiOS OSPF, but BGP’s policy hooks are richer).
  • Withdraw-on-failure is the protocol’s default behaviour — no extra plumbing to make it work.
  • Aligns with Fortinet’s SD-WAN Architecture for Enterprise recommendation: when crossing AS boundaries, run BGP.

Cons

  • More configuration. Route-maps, community-lists, prefix-lists, and they have to be agreed with the DCE team.
  • Slightly slower default convergence than OSPF if you don’t tune the timers — fixed in Part 4.
  • AS-path loop prevention can bite when multi-homing if you’re not careful with set allowas-in. Don’t enable it unless you’ve decided you actually want the loop.

Failure-mode test (DCE peering up but DCE unreachable): BFD declares the DCE neighbour down, the eBGP session drops, prefixes are withdrawn, redistribution-equivalent path (it’s already in BGP, no redistribution required) immediately reflects the withdrawal to spokes via the spoke-side BGP. Spokes converge to HUB-2. Passes the test, and unlike OSPF the same protocol carries the policy decisions, so tuning is in one place.

Verdict: in this design, this is the answer.

Pros/cons at a glance

PropertyStaticOSPFeBGP
Detects DCE-side failureNo (interface only)Yes (with BFD)Yes (with BFD)
DCE-side preferenceManualCost-basedCommunity-based
Carries communitiesn/aNoYes
Survives DCE prefix churnManual updateAutoAuto
AS / policy boundaryImplicitImplicitExplicit
Convergence (tuned)Link-state~1 s (BFD)~1 s (BFD)
Operational complexityLowMediumMedium-High
Fortinet BP recommendation for AS boundaryNoNoYes

The redistribution/glue layer

Whichever upstream protocol you pick, the hub still has to glue what it learns upstream into what it sends downstream (and vice versa). With BGP both ways, this is one protocol with route-maps; with OSPF or static, it is redistribution.

A few rules to keep the glue safe:

  • Never redistribute everything. Always use a route-map with a prefix-list match. The default-deny stance prevents a DCE prefix leak from dragging unrelated routes into the spoke overlay.
  • Tag at ingress. Add a community at the hub-to-DCE ingress route-map (from-dce) so downstream policy on spokes (or the other hub, if you ever add iBGP) can match on it.
  • Strip next-hop on redistribution from non-BGP. OSPF redistribution into BGP without next-hop-self somewhere downstream will land spokes with the OSPF neighbour as next-hop, which spokes can’t resolve.
  • Cap prefix-count. set maximum-prefix on every BGP neighbour, both sides. Limits blast radius from a misconfigured prefix-list.

The recommendation

Run eBGP between each hub and the DCE, with BFD, communities for active/standby preference, and prefix-list-bounded route-maps in both directions. Static is a poor fit because it can’t withdraw on a soft failure. OSPF works but doesn’t earn its keep when there’s a clean AS boundary already.

That recommendation is consistent with Fortinet’s published architecture guidance and tracks the convergence and failure-detection requirements set out in Part 1.

Failback behaviour and what to watch

When DC1 recovers after a failover to DC2:

  • DCE relearns routes from HUB-1 with the higher local-preference (set by community) and switches return traffic back.
  • Spokes see HUB-1 advertise DCE prefixes again and prefer them via the local-preference set on the spoke-side route-map (Part 2).

The two switches are independent and can race. In practice, it doesn’t matter — the underlying transports are both up before either BGP session has fully reconverged, so the only flows in flight are TCP, which retransmits. Where it does matter is for stateful flows that pin source IP — the active/standby choice in Part 1 was specifically to make sure these don’t move until there is a real DC failure.

If you want to dampen flap-induced failback you can set BGP route-flap dampening on the DCE-side neighbours, or add a hold timer on the spokes that stops them switching back to HUB-1 for N minutes after a failure. The simpler and more honest fix is to make the underlying paths reliable.

Where Part 4 picks up

We’ve now got hub↔spoke (Part 2) and hub↔DCE (this post) running cleanly in steady state. Both rely heavily on phrases like “with BFD” and “with tuned timers”. Part 4 is the timer-math part: DPD vs BFD on tunnels, BFD-for-BGP, hold-down/keepalive ratios, and what convergence numbers each combination actually delivers.