NSE5 Part 3: High Availability

NSE5 Part 3: High Availability

Part 3 of the NSE5 series. FortiManager HA is similar in spirit to FortiGate FGCP but very different in mechanics — there is no virtual MAC, no shared IP, and no transparent failover. This post walks through how the cluster actually behaves and the diagnostics you’ll be asked to read on the exam.

What FortiManager HA is and isn’t

A FortiManager HA cluster is up to five units that synchronise their database with one primary that all writes flow through. Secondary units are read-only mirrors. There is no load balancing — you don’t aim a managed FortiGate at “the cluster”, you aim it at the primary’s IP. Failover is manual unless you’ve explicitly configured it otherwise.

That’s the most common misconception on the exam: people assume FortiManager HA behaves like FortiGate HA. It doesn’t.

FortiGate HA (FGCP)FortiManager HA
Active–passive or active–activePrimary–secondary only
Shared virtual MACNo virtual MAC; each unit has its own IP
Automatic failover with sub-second convergenceManual failover by default
Up to 4 cluster membersUp to 5 cluster members
Heartbeat over dedicated linkSync over any IP-reachable interface

Cluster requirements

  • Same hardware model (or VM size).
  • Same FortiManager firmware to the build number.
  • Reachable peer IP between members on the configured port (TCP/5199 by default).
  • Same time — NTP must be synchronised. Out-of-sync clocks cause silent log replay failures.

There is no equivalent of FGCP’s “session pickup” — the device-manager state, ADOM database, and revision history are what’s synchronised, not network sessions.

Configuration

config system ha
    set mode primary
    set group-id 1
    set group-name "fmg-ha"
    set password ********
    set hb-interface "port1"
    set hb-interval 5
    set file-quota 4096
    config peer
        edit 1
            set ip 10.10.20.11
            set serial-number FMG-VM0000000001
        next
    end
end

On the secondary:

config system ha
    set mode secondary
    set group-id 1
    set group-name "fmg-ha"
    set password ********
    set hb-interface "port1"
    config peer
        edit 1
            set ip 10.10.20.10
            set serial-number FMG-VM0000000000
        next
    end
end

What each line does:

  • modeprimary, secondary, or standalone. There is no automatic election.
  • group-id / group-name / password — must match across the cluster. Mismatched group-id is the most common reason a freshly built cluster won’t join.
  • hb-interval — heartbeat frequency in seconds. Default is 5; the exam expects you to know that.
  • file-quota — disk space (MB) reserved for sync data on the primary. If the secondary falls far behind and uses up the quota, sync stops. Increase to 8192 on a busy ADOM.
  • config peer — explicit peer list. Must include the serial number of the other unit, not just an IP. This is unusual and is a frequent exam gotcha.

What syncs

The primary continuously sends the secondary:

  • Device database (registered FortiGates, model devices, serial numbers).
  • Policy packages and objects per ADOM.
  • Provisioning templates and CLI templates.
  • Scripts.
  • ADOM revisions and revision history.
  • Admin users, profiles, and SSO configuration.
  • Most config system settings except the local-only ones below.

What does not sync:

  • HA configuration itself (each unit has its own).
  • Hostname.
  • Local interface IPs (each unit needs its own).
  • Logs and reports (live FortiAnalyzer-style data, where applicable).
  • Backups stored on the device.

The exam will ask “if I add a managed device on the secondary, does it appear on the primary?” — answer: no, because writes only succeed on the primary. The secondary’s GUI is read-only by design.

Monitor IPs

A monitor IP lets the cluster detect a network partition. Configure one on each unit:

config system ha
    config monitored-ips
        edit 1
            set ip 10.10.20.1     ; the upstream gateway
            set interface "port1"
        next
    end
end

When a unit can’t ping its monitor IP, it considers itself partitioned. By default the secondary will not auto-promote — it stays read-only — but the primary will log that it has lost peering. Auto-promotion requires explicitly enabling failover-on-IP-loss.

Manual failover

The most common case. On the current primary:

execute ha-manage demote

On the unit you want to become primary:

execute ha-manage promote

There is no “graceful” semantic — promote/demote is immediate. If both units end up thinking they’re primary (split-brain), the secondary’s database is overwritten on rejoin, so be careful which order you run the commands.

For a planned failover, the safe sequence is:

  1. Confirm sync status is healthy (see diagnostics below).
  2. Demote the current primary.
  3. Promote the chosen secondary.
  4. Repoint managed FortiGates’ FGFM target if the IP has changed (it usually has).

Repointing managed devices

Because there is no shared IP, every managed FortiGate has the primary’s IP in its central-management config. After a failover that changes the primary’s IP, every device must be told the new IP. From each FortiGate:

config system central-management
    set type fortimanager
    set fmg "10.10.20.11"
end
execute central-mgmt register-device <FMG-serial> ********

Or, if you’ve planned ahead, use a DNS name and let the FortiGates resolve it. The DNS approach is the production-friendly answer the exam looks for.

Diagnostics

get system ha status
diagnose ha stats
diagnose ha sync-stat

get system ha status shows the cluster summary — who’s primary, last sync time, peer reachability. diagnose ha sync-stat shows the per-table sync state and is the command to run when “the cluster says it’s healthy but the secondary is missing my recent change”.

For the heartbeat:

diagnose ha hb-info
diagnose debug application haperiod -1
diagnose debug enable

hb-info shows which interface is being used and the last heartbeat seen. The debug switch (-1 = all flags) is verbose; remember to disable when done:

diagnose debug disable

Split-brain recovery

Split-brain on FortiManager is rare but not impossible — usually after a network partition where someone manually promoted the secondary. To recover:

  1. Decide which unit has the canonical database (usually the one that was primary before the split).
  2. On the other unit (the one whose data will be lost): demote it, then re-add it as a fresh secondary. Its database is wiped on rejoin and resynced from the surviving primary.
execute ha-manage demote
execute factoryreset                  ; only on the unit being rebuilt
config system ha
    set mode secondary
    ...
end

execute factoryreset is reserved for the rebuild case — it nukes everything. Don’t run it on a healthy unit.

Common exam scenarios

  • “Two-unit cluster, primary fails, secondary still read-only.” Expected — failover is manual unless explicitly configured otherwise.
  • “Cluster shows healthy, but a recent policy package change isn’t on the secondary.” Sync lag — diagnose ha sync-stat will show the table that’s behind.
  • “Secondary cannot rejoin after firmware upgrade.” Firmware mismatch — both units must be on the same build before HA forms.
  • “Peer added but cluster still split.” Wrong serial number in config peer, or group-password mismatch.

Part 4 takes us into ADOMs — administrative domains, the multi-tenancy primitive that drives almost every other configuration choice on the device.