NSE4 Part 10: High Availability
NSE4 Part 10: High Availability
The final post in the NSE4 series. Lesson 16 — High Availability — is most often tested by handing you diagnose sys ha output and asking what state the cluster is in, so this post emphasises the behaviour you read in diagnostics rather than the GUI clicks.
FGCP — what it is
FGCP (FortiGate Clustering Protocol) is Fortinet’s HA implementation. Two or more FortiGates of the same model and firmware form a cluster that presents itself to the network as a single device — same MAC, same IP, same config. State is synchronised over dedicated heartbeat links.
Two modes:
| Mode | Behaviour |
|---|---|
| Active-Passive (A-P) | One unit forwards; others sit hot-standby |
| Active-Active (A-A) | All units forward; sessions distributed via load balancing |
Most production clusters are A-P. Active-active sounds like extra throughput on paper but the load balancing happens through one primary so it rarely scales the way the name suggests.
Cluster requirements
- Identical hardware model. No mixing FG-100F with FG-101F, even if FortiOS versions match.
- Identical firmware. Mismatch breaks sync.
- Identical licence type. All units licensed for the same features.
- Dedicated heartbeat interfaces. At least one, ideally two for redundancy.
Configuration
config system ha
set group-id 1
set group-name "edge-ha"
set mode a-p
set hbdev "ha1" 50 "ha2" 100
set session-pickup enable
set session-pickup-connectionless enable
set ha-mgmt-status enable
config ha-mgmt-interfaces
edit 1
set interface "mgmt"
set gateway 10.10.0.1
next
end
set override disable
set priority 200
set monitor "wan1" "wan2"
end
What the lines do:
hbdev— heartbeat interface plus its priority. Dual heartbeat is standard; if the highest priority link fails, the next takes over without dropping the cluster.session-pickup— synchronise the session table so flows survive failover. Off by default for performance reasons; almost always turned on in production.session-pickup-connectionless— sync UDP flows too, not just TCP.ha-mgmt-status+ha-mgmt-interfaces— give each unit its own out-of-band management IP so you can ssh to the secondary directly.override— if disabled, the current primary stays primary even after a higher-priority unit recovers (no flap). If enabled, highest priority always wins, which can cause unwanted re-election cycles.priority— higher wins primary election (default 128, range 0–255). Tied priorities go to the unit with the lowest serial number.monitor— link-failure detection on data interfaces. If a monitored interface goes down, the unit’s effective priority drops and the cluster re-elects.
Election
A primary is selected on cluster formation and after any failover trigger. The selection criteria, in order:
- Most monitored interfaces up — the unit with fewer link failures wins.
- HA uptime — the unit that’s been up longest wins (subject to
set ha-uptime-diff-margin). - Priority — higher wins.
- Serial number — lower wins (tiebreaker only).
If override is enable, priority overtakes uptime in the order. The exam will ask which unit becomes primary in a given scenario; the trick is to apply this list in order, not skip steps.
Synchronised state
The primary continuously sends to the secondary:
- Configuration — every CLI/GUI change.
- Routing tables — RIB, FIB.
- Session table — if
session-pickupis enabled. - IPsec SAs — Phase 1 and Phase 2 keys.
- SSL VPN sessions — only when
session-pickupand SSL VPN settings allow it. - DHCP leases, anti-virus quarantine, user authentication state, etc.
Items not synchronised by default include some application-level caches and runtime stats. Don’t be surprised if FortiView counts reset on the new primary after failover — that’s expected.
Virtual MAC and failover
The cluster uses virtual MAC addresses on each data interface — the same MAC across primary and secondary. When the secondary takes over, it doesn’t have to send a gratuitous ARP because the MAC stays the same; the upstream switch’s MAC table is already correct (assuming heartbeat runs for long enough that the secondary has learned the primary’s traffic).
Switches between FortiGates and the rest of the network must allow MAC moves quickly — port-security configs that pin a MAC to a port can break failover.
Diagnostics
The first three commands solve most cases:
get system ha status
diagnose sys ha status
diagnose sys ha checksum cluster
get system ha status gives the human-friendly summary — primary/secondary, uptime, monitored interface state. diagnose sys ha status is the verbose form with the FGCP internals. checksum cluster shows whether the running configs match across cluster members; mismatch indicates failed sync.
diagnose sys ha cluster-csum
execute ha manage <id>
execute ha manage jumps your CLI session from one cluster member to another over the heartbeat — extremely useful when troubleshooting a misbehaving secondary.
For a forced failover during planned maintenance:
diagnose sys ha reset-uptime
This zeroes the current primary’s HA uptime, causing re-election. The remaining unit (with longer uptime) wins. Cleaner than physically rebooting.
Common exam scenarios
- “Two units, both with priority 128, identical uptime — which is primary?” Lower serial number wins.
- “Primary loses one monitored interface; secondary loses none — failover?” Yes, monitored-up count outranks uptime and priority.
- “Override is enabled, lower-priority unit is primary, higher-priority unit comes online — what happens?” Re-election; higher-priority unit takes over.
- “Sessions drop on failover even though session-pickup is enabled.” Either
session-pickup-connectionlessis off (UDP sessions lost) or the session is too new — sessions only sync after they’re established for a brief threshold (configurable viasession-sync-dev).
Series wrap-up
That closes out the official NSE4 syllabus. Across this series you’ve now seen the protocols, the GUI, the CLI and the diagnostics for every lesson on the exam. The fastest way to fix what you’ve read into memory is the loop I’ve recommended throughout: open the lab, click the GUI step, run the CLI equivalent, capture the diagnostic output, and walk it through diagnose debug flow until you’re predicting what each line will say before it appears.
Good luck on the exam.