The Ultimate FortiOS CLI Reference for the NSE 4 Exam – Part 3: VPN & HA
VPN and HA are two of the highest-weighted topics on the NSE 4 exam, and both share the same diagnostic challenge: the system state you care about is maintained in daemon memory, not visible in the GUI, and the CLI output requires you to cross-reference multiple commands to reach a conclusion. This module gives you the complete toolkit and teaches you how the fields relate to each other.
Module 5: VPN Infrastructure Troubleshooting
diagnose vpn ike gateway list
Command Syntax & Architectural Impact
diagnose vpn ike gateway list
diagnose vpn ike gateway list name <phase1-name>
This command queries the IKE daemon (iked) for its in-memory record of every Phase 1 negotiation state. IKE (Internet Key Exchange) is the control-plane protocol for IPsec: Phase 1 establishes a secure authenticated channel (the IKE SA) between the two peers; Phase 2 runs inside that channel to negotiate the actual encryption parameters for data-plane traffic (the IPsec SA). diagnose vpn ike gateway list shows Phase 1 exclusively.
Each entry in the output corresponds to a configured Phase 1 object (config vpn ipsec phase1-interface). The entry persists in iked’s memory regardless of whether Phase 1 is currently up — this is how you can distinguish between “Phase 1 has never succeeded”, “Phase 1 was up and dropped”, and “Phase 1 is currently established.”
The cookies (initiator and responder) are the IKE SA identifiers — a pair of random 64-bit values exchanged during IKEv1 Main Mode or IKEv2 initial exchange. They serve the same function as TCP ports: they uniquely identify a specific IKE SA instance so that retransmits and rekeying messages can be correlated correctly.
Real-World Use Case Scenario
A site-to-site IPsec VPN to a remote branch has been reported as “down” by the branch manager. You need to determine whether Phase 1 ever negotiated successfully, whether it is currently in negotiation (mid-handshake), or whether iked has not even attempted it (suggesting a trigger or configuration problem). If Phase 1 is up but traffic is not flowing, the problem is in Phase 2 or routing. If Phase 1 is not up, the problem is authentication (PSK mismatch, certificate issue) or reachability (UDP 500/4500 blocked on path).
Live Output Breakdown
FortiGate-100F # diagnose vpn ike gateway list
name: BRANCH-LONDON
version: 2
interface: port1 7
addr: 203.0.113.1:500 -> 198.51.100.5:500
tun_id: 203.0.113.1/::203.0.113.1
network-id: 0
created: 4d21h ago
peer_notif: 0
dpd-expire: 0
auto-up: 1
natt: type= NAT-T dst= remote-port=4500 src-port=4500
IKE SA: created 1/1 established 1/1 time 0/130/240 ms
id=2 lifetime=86400 rekey=82000 reauth=0
ESP proposal: AES_CBC-128/SHA1/MODP_2048
initiator: ce8e7d3a1f4b2091:b829e57d3c14a9f0
cur: initiator ce8e7d3a1f4b2091:b829e57d3c14a9f0
life: type=seconds bytes=0 active=15482 negotiating=0
responder: 198.51.100.5 203.0.113.1
created: 4d21h ago expires: 18h
DPD seq no: 53, 53
DPD state: sendack
Key Exam Indicators
| Field | What to look for |
|---|---|
IKE SA: created 1/1 established 1/1 | Format is created N/M established P/Q. created is total SA attempts; established is successful completions. 1/1 established = Phase 1 is currently up. 1/0 established = Phase 1 was attempted but failed (authentication or proposal mismatch). 0/0 = no attempt has been made. |
addr: 203.0.113.1:500 -> 198.51.100.5:500 | If port is 4500 instead of 500, NAT-T is active (one or both peers are behind a NAT device). IKEv2 uses UDP 4500 for all traffic after the initial IKE_SA_INIT when NAT is detected. |
life: active=15482 negotiating=0 | active is seconds this SA has been established. negotiating is seconds currently in a renegotiation. Non-zero negotiating means rekey is in progress. If negotiating is stuck at a high value, the rekey is failing silently. |
DPD seq no: 53, 53 | DPD (Dead Peer Detection) sent/received counts. Both numbers should be equal if the peer is responding. If sent is much higher than received, the remote peer is not responding to DPD probes — the tunnel may be a “zombie” (Phase 1 state held locally but the remote has lost the SA). |
version: 2 | IKEv2. version: 1 = IKEv1. The negotiation process, message exchange count, and re-auth behaviour differ. IKEv2 uses fewer round trips (4 messages for initial exchange vs. 6 for IKEv1 main mode) and has built-in NAT-T and MOBIKE support. |
Initiator cookie ce8e7d3a1f4b2091 | This cookie pair identifies the specific IKE SA instance. If Phase 1 renegotiates, the cookies change. Correlate this with Wireshark/sniffer captures of IKE UDP traffic if you need to match on-wire packets to the FortiGate’s internal state. |
diagnose vpn tunnel list
Command Syntax & Architectural Impact
diagnose vpn tunnel list
diagnose vpn tunnel list name <phase2-name>
This command queries iked for Phase 2 IPsec Security Association (SA) state. Where Phase 1 is the control channel, Phase 2 SAs are the actual data-plane encryption contexts — one SA for each direction of traffic (an inbound SA and an outbound SA), each identified by a unique SPI (Security Parameter Index).
The SPI is a 32-bit value carried in every ESP or AH packet header. When the receiving FortiGate sees an inbound ESP packet, it looks up the destination IP + SPI combination to find the correct SA and derive the decryption key. SPI mismatches — where the local unit expects one SPI but the remote is sending another — cause all inbound traffic to fail decryption silently: the packets arrive, the FortiGate cannot find a matching SA, and they are dropped without any policy-level log entry.
The packet counters (enc pkts, dec pkts, enc bytes, dec bytes) are per-SA, per-direction, and reset on each Phase 2 renegotiation. Cross-referencing enc/dec counter asymmetry between the two peers is the primary method for diagnosing one-way VPN traffic.
Real-World Use Case Scenario
Phase 1 is established (confirmed by diagnose vpn ike gateway list). Users at the branch can ping the HQ FortiGate but cannot reach any servers in the HQ LAN. You suspect Phase 2 is either not established or is up in one direction only. You run diagnose vpn tunnel list on the HQ unit and the branch unit simultaneously (via separate console sessions) and compare the enc/dec packet counters for the relevant Phase 2 SA.
Live Output Breakdown
FortiGate-100F # diagnose vpn tunnel list
name=BRANCH-LONDON ver=2 serial=1 203.0.113.1:0->198.51.100.5:0 tun_id=198.51.100.5 tun_id6=::198.51.100.5
bound_if=7 lgwy=203.0.113.1:0 tun_if=ssl.root rgwy=198.51.100.5:0
proxyid=BRANCH-LONDON proto=0 sa=1 ref=4 serial=1
src: 0:10.10.0.0-10.10.0.255:0
dst: 0:172.16.0.0-172.16.0.255:0
SA: ref=6 options=18200 type=00 soft=0 mtu=1438 expire=28640
softexpire: 28340 dst-addr=198.51.100.5 src-addr=203.0.113.1
life: type=seconds-kilobytes bytes=0 active=15542 negotiating=0
SPI: 00000000 0x8f2a3b01 (2401452801)
SPI: 00000000 0x3d9f1c04 (1033412612)
enc pkts=4218 enc bytes=3841920
dec pkts=0 dec bytes=0
rekey: lifetime=3600 negotiating=0
npu_flag=12 npu_rgwy=198.51.100.5 npu_lgwy=203.0.113.1 npu_selid=0
run_tally: 0
Key Exam Indicators
| Field | What to look for |
|---|---|
sa=1 | Number of Phase 2 SAs currently active for this proxy-id. sa=0 means Phase 2 has not negotiated — investigate Phase 2 proposal mismatch (cipher, hash, PFS group, or proxy-id mismatch). |
enc pkts=4218 dec pkts=0 | This is the smoking gun for one-way VPN traffic. Packets are being encrypted and sent (enc pkts incrementing) but nothing is being decrypted (dec pkts=0). The remote end is either not sending traffic, sending to the wrong SPI, or the inbound SA has a different SPI than the remote’s outbound SA — the classic SPI mismatch scenario. |
SPI: 0x8f2a3b01 and SPI: 0x3d9f1c04 | Two SPI values appear: one for the outbound SA (used when encrypting), one for the inbound SA (used when decrypting). The inbound SPI of the local unit must match the outbound SPI of the remote unit, and vice versa. If they do not match after a failed renegotiation, the SAs are out of sync. |
mtu=1438 | The Phase 2 SA’s effective MTU after IPsec overhead is subtracted. This is the value FortiOS uses when deciding whether to fragment or send ICMP Fragmentation Needed. If this is misconfigured (e.g. 1500 instead of 1438), large packets will be silently dropped at the encryption point. |
src: 0:10.10.0.0-10.10.0.255:0 and dst: 0:172.16.0.0-172.16.0.255:0 | The proxy-id (or traffic selector in IKEv2). Only traffic matching this source/destination range is routed through this tunnel. If a client’s IP falls outside this range, its packets will be routed via the normal routing table instead of the tunnel — and potentially forwarded in clear text. |
npu_flag=12 | Non-zero npu_flag means this SA has been offloaded to the NP hardware for encrypt/decrypt. npu_flag=0 means software encryption — expected on VM appliances and on platforms whose kernel has disabled NP offload for this SA due to an incompatible cipher. |
diagnose debug application sslvpn -1
Command Syntax & Architectural Impact
diagnose debug application sslvpn -1
diagnose debug enable
This command sets the debug verbosity for the sslvpnd daemon to its maximum level (-1). sslvpnd is a standalone process that handles the entire SSL-VPN lifecycle: TLS handshake, certificate validation, user authentication (against LDAP/RADIUS/local), group lookup, portal assignment, IP pool allocation (for tunnel mode), and web bookmarks (for web mode). It operates independently of the main firewall policy engine — SSL-VPN users are processed through sslvpnd before their traffic enters the normal policy pipeline.
The debug output is streamed directly to the console in real time. Every authentication attempt produces a structured log chain: TLS negotiation, authentication protocol selection (RADIUS/LDAP/local), bind or lookup result, group membership evaluation, portal assignment, and final accept/reject decision. Each step is annotated with a result code.
The -1 flag means “all messages at all severity levels.” For production use in quiet environments, -6 (error and above) or -3 (warning and above) produces less noise. For initial troubleshooting, -1 is the correct starting point.
Real-World Use Case Scenario
Remote users are being rejected by the SSL-VPN portal with a generic “authentication failed” message. The RADIUS server administrator insists their server is healthy and accepting requests from other services. You need to determine: (1) is sslvpnd successfully contacting the RADIUS server, (2) is the RADIUS server returning Accept or Reject, (3) if Accept, is group membership evaluation then failing (e.g. the user is not in the FortiGate’s configured SSL-VPN user group), and (4) which portal is being assigned (or failing to be assigned) after successful authentication?
Live Output Breakdown
FortiGate-100F # diagnose debug application sslvpn -1
FortiGate-100F # diagnose debug enable
[sslvpnd 1234 - SSL] Incoming connection from 203.0.113.200:54211
[sslvpnd 1234 - SSL] SSL negotiation done, peer cert: NONE
[sslvpnd 1234 - AUTH] user 'jsmith' attempting authentication via RADIUS
[sslvpnd 1234 - RADIUS] sending Access-Request to 10.10.20.10:1812 id=42
[sslvpnd 1234 - RADIUS] received Access-Accept from 10.10.20.10:1812 id=42
[sslvpnd 1234 - AUTH] user 'jsmith' authenticated successfully
[sslvpnd 1234 - GRPCHK] checking group membership for user 'jsmith'
[sslvpnd 1234 - GRPCHK] user 'jsmith' NOT found in group 'SSL-VPN-USERS'
[sslvpnd 1234 - POLICY] no portal assignment matched for user 'jsmith'
[sslvpnd 1234 - POLICY] sending deny: no portal policy matched
-- Authentication and group check pass scenario --
[sslvpnd 5678 - SSL] Incoming connection from 198.51.100.9:61200
[sslvpnd 5678 - AUTH] user 'mgarner' authenticated successfully
[sslvpnd 5678 - GRPCHK] user 'mgarner' found in group 'SSL-VPN-USERS'
[sslvpnd 5678 - POLICY] portal 'full-access' assigned to user 'mgarner'
[sslvpnd 5678 - TUNNEL] allocating tunnel IP from pool 'sslvpn-pool': 10.200.0.5
[sslvpnd 5678 - TUNNEL] tunnel established, pushing routes: 10.10.0.0/16
Key Exam Indicators
| Line | What to look for |
|---|---|
Access-Accept from RADIUS then NOT found in group | RADIUS authentication succeeded but FortiGate group membership check failed. The RADIUS server is healthy. The problem is the SSL-VPN user group configuration on the FortiGate — either the user is not a member of the group referenced in the SSL-VPN portal policy, or RADIUS group attributes (VSA / Fortinet-Group-Name) are not being sent. |
Access-Reject from RADIUS | RADIUS authentication failed server-side. The credentials are wrong, the RADIUS shared secret is mismatched, or the NAS IP is not permitted on the RADIUS server. Cross-reference with RADIUS server logs. |
no portal policy matched | The user authenticated and passed group checks but no portal-policy rule in config vpn ssl web portal matched the combination of user/group/realm. The SSL-VPN portal policy (config vpn ssl settings → authentication-rules) must explicitly map this group to a portal. |
allocating tunnel IP from pool | Tunnel mode is active and IP assignment succeeded. If the pool is exhausted, this line reads “no IP available in pool” and the user receives a VPN-connected status with no tunnel IP — traffic never flows. |
SSL negotiation done, peer cert: NONE | The client is not presenting a client certificate. If the portal requires certificate authentication, this results in a deny. If the portal allows password-only auth, this is fine. |
Stopping SSL-VPN debug:
FortiGate-100F # diagnose debug application sslvpn 0
FortiGate-100F # diagnose debug disable
Note: setting sslvpn verbosity to 0 (rather than just disable) stops that daemon’s output specifically while preserving any other active debug streams.
Module 6: High Availability Cluster Mechanics
get system ha status
Command Syntax & Architectural Impact
get system ha status
This command reads state from hatalk, the HA daemon that manages the FortiGate HA cluster protocol. hatalk is responsible for: heartbeat link monitoring, master election, configuration synchronisation, session table synchronisation (for stateful failover), and split-brain prevention. get system ha status provides a snapshot of everything hatalk knows about the cluster at the time of the command.
The master election algorithm is the critical concept the exam tests. When the cluster first forms (or after a failover), the winning unit is selected by evaluating the following criteria in order, with each only acting as a tiebreaker if the previous criterion produces a tie:
- HA override (if
config system ha→override enable): whichever unit hasoverrideset to its serial number wins unconditionally, even if its uptime is lower after a reboot. - Connected monitored ports: the unit with more monitored ports in the
link-failed-signallist still active wins. - HA priority: configurable integer (0-255), higher value wins.
- Uptime: longer-running unit wins (prevents flapping on simultaneous boot).
- Serial number: higher serial number wins (deterministic tiebreaker).
Understanding this election order is essential: a common misconfiguration is setting equal priorities and assuming a specific unit will be primary — without override, the unit with higher uptime after a maintenance window will be primary, which may be the wrong unit.
Real-World Use Case Scenario
After a planned maintenance window, the secondary FortiGate was rebooted first and the primary second. When the primary came back up, the secondary (which had been up longer at that moment) won the election and became master. The primary is now incorrectly in slave mode. You need to verify: (1) which unit is currently master, (2) what election criteria caused this result, and (3) whether override is enabled and set correctly to force the intended primary back to master on next election.
Live Output Breakdown
FortiGate-600F # get system ha status
HA Health Status: OK
Model: FortiGate-600F
Mode: HA A-P
Group: 1
Debug: 0
Cluster Uptime: 4 days, 21:03:44
Master Selected using: Connected Monitored Ports, HA Group ID
ses sync: done
ses_pickup: enable
HA uptime: in sync
Master:
FW-CORE-02, serialno FG6H0E5818900002, managed_id=0
Connected Monitored Ports: 4
Last rebooted: Wed Apr 03 07:48:02 2024
HA Group ID: 0
Last FGFM heartbeat: 00:00:01
HA Primary heartbeat up: YES
HA Secondary heartbeat up: YES
Slave:
FW-CORE-01, serialno FG6H0E5818900001, managed_id=1
Connected Monitored Ports: 4
Last rebooted: Wed Apr 03 08:01:22 2024
HA Group ID: 0
Configuration Status:
FW-CORE-02(updated 1): in-sync
FW-CORE-01(updated 1): in-sync
Key Exam Indicators
| Field | What to look for |
|---|---|
Master Selected using: Connected Monitored Ports, HA Group ID | This tells you which election criteria determined the current master. In this case, both units had equal monitored ports so Group ID was the tiebreaker. If you see Uptime listed here, uptime was the deciding factor — indicating the override setting is probably not enabled. |
Master: FW-CORE-02 vs. intended primary | If FW-CORE-01 is the intended primary but FW-CORE-02 is currently master, the fix is either: (a) set override enable + set priority 200 on FW-CORE-01 and set priority 100 on FW-CORE-02, then trigger a negotiation, or (b) use execute ha manage 1 to access the slave CLI and execute reboot to force a re-election. |
ses sync: done | Session synchronisation is complete. If this shows syncing for an extended period, the heartbeat links may be saturated or the secondary is processing sessions faster than they can be synced. |
ses_pickup: enable | After a failover, the new primary will attempt to continue existing TCP sessions using the synchronised session table rather than resetting all connections. If ses_pickup is disable, all sessions reset on failover — important to know for exam scenarios about failover behaviour. |
Last FGFM heartbeat: 00:00:01 | Time since the last heartbeat was received. Should always be under 1-2 seconds. If this grows, the HA heartbeat link (usually dedicated interfaces or port sharing) has a problem. |
HA Primary heartbeat up: YES / HA Secondary heartbeat up: YES | Both heartbeat paths are up. If either is NO, the cluster is operating on a degraded heartbeat topology — a split-brain risk if the remaining heartbeat link also fails. |
Configuration Status: in-sync | Configuration is synchronised. out-of-sync here means a change was made on the master that has not yet replicated to the slave, or a manual change was made directly on the slave (which breaks the golden rule: all config changes must go through the master). |
diagnose sys ha checksum cluster
Command Syntax & Architectural Impact
diagnose sys ha checksum cluster
diagnose sys ha checksum show
diagnose sys ha checksum recalculate
FortiOS HA synchronisation works by maintaining an MD5 checksum of each configuration zone (called a “debug zone”) on both cluster members. After every configuration change on the master, hatalk pushes the modified zone to the slave and both units recompute the checksum for that zone. If the checksums match, the zone is in sync. If they diverge, the master schedules a resync for that zone.
diagnose sys ha checksum cluster shows the per-zone checksum for the local unit alongside the checksum received from the peer unit in the most recent heartbeat exchange. A mismatch in any zone identifies exactly which section of configuration is out of sync — without this command, “out-of-sync” is all you know; with it, you know which zone to investigate.
The zones correspond to FortiOS configuration objects: firewall.policy, vpn.ipsec.phase1-interface, system.interface, router.static, etc. Each zone maps to a branch of the config tree. Knowing the zone name lets you target your comparison.
Real-World Use Case Scenario
get system ha status shows out-of-sync for the slave. You have just completed a maintenance window that involved adding 15 new firewall policies, modifying 3 VPN tunnels, and updating 2 static routes. The GUI is showing an HA sync warning. You need to determine: (1) which specific configuration zone(s) diverged, (2) whether this is a normal post-change propagation delay or a genuine sync failure, and (3) whether you need to force a resync or whether hatalk will resolve it automatically.
Live Output Breakdown
FortiGate-600F # diagnose sys ha checksum cluster
==[root]
chassis-id=0 slot-id=0 box-id-code=2
is_manage_master=1
global: 4d2a1f8e3c9b0517
root
firewall.policy: a3f2b1c4d5e60718
firewall.address: 9c8b7a6d5e4f3021
system.interface: 12345678abcdef01
router.static: deadbeef01234567
vpn.ipsec.phase1-interface: 8f7e6d5c4b3a2190
vpn.ipsec.phase2-interface: 1a2b3c4d5e6f7080
user.local: aaaa1111bbbb2222
user.group: cccc3333dddd4444
==[slave slot-id=1]
chassis-id=0 slot-id=1 box-id-code=2
global: 4d2a1f8e3c9b0517
root
firewall.policy: a3f2b1c4d5e60718
firewall.address: 9c8b7a6d5e4f3021
system.interface: 12345678abcdef01
router.static: deadbeef01234567
vpn.ipsec.phase1-interface: 8f7e6d5c4b3a2190
vpn.ipsec.phase2-interface: FFFFFFFF00000001 <-- MISMATCH
user.local: aaaa1111bbbb2222
user.group: cccc3333dddd4444
Key Exam Indicators
| Field | What to look for |
|---|---|
| Matching checksums across all zones | All zone hashes identical between master and slave = fully synchronised. This is the desired state. get system ha status would show in-sync. |
vpn.ipsec.phase2-interface: FFFFFFFF00000001 on slave vs 8f7e6d5c4b3a2190 on master | The hashes differ for the Phase 2 configuration zone only. This tells you exactly which configuration object to compare between units. The fix is typically diagnose sys ha checksum recalculate on the slave (to rule out a stale cached hash), followed by manual comparison of the Phase 2 config on both units. |
global: hash matching | The global zone covers config system global settings (hostname, timezone, admin settings). If this mismatches, fundamental system settings diverged — usually the result of someone accessing the slave CLI directly and making a change. |
is_manage_master=1 | This output is from the master unit (is_manage_master=1). On the slave, this shows 0. Always confirm which unit you are running the command on before interpreting the “master” vs “slave” sections of the output. |
All checksums 00000000 on slave zones | If the slave shows all-zero checksums, the slave has not yet received a full configuration sync — either the slave just joined the cluster, or the heartbeat link is broken and configuration sync has not completed. A diagnose sys ha reset-uptime followed by an HA failover to the fully-synced unit may be required. |
diagnose sys ha checksum recalculate | Forces hatalk to recompute checksums from the actual on-disk configuration rather than using the cached in-memory values. Run this when a mismatch appears but you believe the configurations are actually identical — a daemon state inconsistency may be producing a false mismatch. |
Closing Notes: Putting It All Together
The six modules in this three-part guide form an ordered diagnostic ladder. Start at the top (system health) and work down:
- Is the unit healthy? →
get system status(firmware, HA role, license),get system performance status(CPU, memory, session rate) - Is the interface up at L1/L2? →
get system interface physical,diagnose hardware deviceinfo nic - Is routing correct? →
get router info routing-table all(FIB),routing-table database(RIB),get system arp - Are sessions being established? →
diagnose sys session filter+list - Are packets flowing? →
diagnose sniffer packet(is traffic arriving?) - Where is it being dropped? →
diagnose debug flow(which kernel function drops it?) - For VPN specifically? →
diagnose vpn ike gateway list(Phase 1),diagnose vpn tunnel list(Phase 2 + SPI counters) - For HA desync? →
get system ha status(role + heartbeat),diagnose sys ha checksum cluster(which zone)
Working through this ladder systematically eliminates entire categories of failure at each step — the exam constructs scenarios that require exactly this kind of structured elimination, and knowing which command answers which question is the skill that separates candidates who pass from those who don’t.
Part of the NSE4 Study Series. For IPsec VPN configuration theory, see Part 8: IPsec VPN. For SSL-VPN configuration, see Part 7: SSL VPN. For HA architecture, see Part 10: High Availability.