A Day in the Life of a Packet on a 50G FortiGate, Part 5: Egress, NPU Offload, and the Full Troubleshooting Cookbook

A Day in the Life of a Packet on a 50G FortiGate, Part 5: Egress, NPU Offload, and the Full Troubleshooting Cookbook

Recap. By the end of Part 4 the packet had been authorised by a forward policy, dispatched to UTM (or not), tagged for SNAT, and the session entry was fully populated with policy ID, NAT decisions, ingress/egress interfaces, and UTM profile pointers. What’s left is the physical act of leaving the box, plus the moment FortiOS decides whether the rest of this session can run in silicon.

This is the shortest of the five parts on conceptual ground but the densest on diagnostics, and it ends with a single-page command reference for the whole series.

Stage 1: traffic shaping (egress)

If the matched policy or any traffic-shaping policy applies a shaper to this session, the packet is enqueued in a software shaper class on the egress interface. FortiOS shapers come in three forms:

  • Per-policy shaper — applied via traffic-shaper and traffic-shaper-reverse on a forward policy.
  • Per-IP shaper — applied via per-ip-shaper. Bandwidth is metered per source IP.
  • Shaping policy — a separate top-level table (config firewall shaping-policy) that re-classifies traffic post-policy and applies shapers without coupling them to a specific firewall policy.

Shapers can do bandwidth limiting, guaranteed bandwidth, and DSCP marking. FortiOS supports queue-class semantics (high/medium/low) for prioritisation.

diagnose firewall shaper traffic-shaper list
diagnose firewall shaper per-ip-shaper list
diagnose firewall shaping-policy list
diagnose firewall shaping-policy stats

NP7 supports interface-level shaping offload but not per-policy software shaper offload. A session with a per-policy shaper attached cannot be fully NP7-offloaded; it gets no_ofld_reason: shaping. Interface bandwidth limits (set on config system interface) are offloaded.

If you’re shaping at scale, prefer interface-level limits or use NP7-offloaded QoS via DSCP, where the NP7 marks egress packets and downstream policers do the actual work.

Stage 2: NPU offload re-evaluation

This is one of the most consequential moments of the packet’s life and is invisible in the GUI.

After the kernel has fully built a session entry, it asks: “is this session a candidate for hardware acceleration?” The answer depends on:

  • Both interfaces are NPU-attached (not management ports, not software switches that aren’t NPU-attached, not loopbacks unless designed for offload).
  • No proxy-based UTM (av-profile in proxy mode, web filter in proxy mode, DLP, email filter).
  • No incompatible helper attached (FTP, SIP, etc., during their setup phases — once the helper is done some flows can be offloaded).
  • No traffic-shaping policy with a software shaper.
  • No fragmented packets on the path that aren’t reassembled.
  • The protocol is supported (TCP, UDP, ICMP, ESP, GRE, SCTP on supported platforms).
  • For IPsec: the SA is offloaded.
  • The session has reached proto_state=01 (TCP established) or equivalent.
  • VDOM-level auto-asic-offload is enabled (default).
  • Policy-level auto-asic-offload is enabled (default).

If it qualifies, the kernel pushes the session entry into the NP7 session cache. From this point on, every subsequent packet of the session is processed entirely by the NP7 — kernel sees nothing.

You can see the moment it happens in diag debug flow:

trace_id=1 ... msg="install session, policy=12, npu=enable, ips=enable"
trace_id=1 ... msg="enter fast path"

And confirm post-install:

diagnose sys session list | grep -A20 "policy_id=12"

Look at:

  • npu info: ... offload=8/8 — both directions.
  • no_ofld_reason: <something> — if non-empty, why not.
  • ips_offload=1/1 — if NTurbo is in play.

The most common no_ofld_reason values, in roughly the order I see them in the field:

ReasonMeaning
redir-to-ipsFlow-based IPS without NTurbo, or a profile NTurbo can’t accelerate
redir-to-avProxy-mode AV — by design, not offloadable
helperHelper attached (FTP/SIP/PPTP/etc.)
non-npu-intfOne of the interfaces isn’t NPU-attached
local-trafficPacket to/from the FortiGate itself
disabled-by-policyauto-asic-offload disable on the policy
asic-unsupportedProtocol or feature combination the NP7 doesn’t support
fragFragmented packet that hasn’t been reassembled
multicastMulticast scenarios with constraints
not-establishedTCP session not yet ESTABLISHED
shapingPer-policy software shaper attached
ha-configHA session sync state

If you see redir-to-ips and you have NTurbo enabled in config system npu — confirm:

diagnose test application ipsmonitor 13
get system npu

NTurbo is sometimes globally enabled but disabled per-VDOM, or disabled because the IPS sensor uses a feature NTurbo can’t accelerate (custom signatures with regex too expensive for hardware, certain DLP overlays, etc.). The ipsmonitor 13 output tells you what’s in NTurbo and what isn’t.

Stage 3: ARP / next-hop resolution

The kernel (or NP7, if offloaded) needs the MAC address of the next-hop. ARP cache is checked; if absent, an ARP request goes out and the packet is held briefly.

get system arp
diagnose ip arp list
diagnose ip arp delete <interface>
diagnose ip arp flush
diagnose ip neighbor list                 # IPv6 neighbour cache

For high-flow boxes, ARP cache exhaustion is rare but real. Watch:

diagnose ip arp count
get system performance status

If a next-hop’s MAC is stale and the host has died (no ARP response), the kernel will hold the packet briefly and then drop. Watch:

diagnose ip arp range
diagnose debug flow ...           # will show "find a route" then nothing further

For IPsec tunnels, “ARP” is replaced by SA selection — there’s no MAC at the L3 tunnel layer.

Stage 4: encapsulation

If the egress interface is an IPsec tunnel, the packet now gets encapsulated:

  1. ESP header pushed.
  2. Payload encrypted (AES-GCM or AES-CBC + HMAC, on a CP9 if available).
  3. Outer IP header constructed (tunnel src → tunnel dst).
  4. UDP/4500 NAT-T wrapping if NAT-T is in play.

The CP9 does the heavy lifting on a 1800F-class box. For high-throughput VPN, NPU offload of IPSec is decisive — software ESP processing on x86 caps out an order of magnitude lower.

diagnose vpn ike gateway list
diagnose vpn tunnel list
diagnose vpn tunnel name <tun> stat
get vpn ipsec tunnel summary
diagnose vpn ipsec status
diagnose vpn ipsec esp ...
diagnose npu np7 ipsec-stats

For VXLAN egress (overlay tunnels, ADVPN spoke-to-spoke direct shortcuts, etc.), encapsulation is also NP7-offloadable on supported platforms. For GRE, similarly.

For plain L3 egress, no encap; the packet is L2-rewritten with the next-hop MAC and transmitted.

Stage 5: transmit

The packet is DMA’d to the egress NIC’s transmit ring. The PHY clocks it onto the wire. Bytes leave the box. The session entry’s counters are updated, the TTL is reset, and the box is ready for the next packet.

For an offloaded session, all of this happened in the NP7 without ever touching a CPU. For a slow-path session, it’s the kernel’s dev_queue_xmit doing the transmit and the kernel updating counters.

diagnose hardware deviceinfo nic <port>     # tx counters
diagnose npu np7 port-list                  # NPU-side tx counters
get system performance status
diagnose sys top                             # if you suspect CPU-bound transmit

Putting it all together — the full slow-path trace, annotated

For a brand-new HTTPS flow from 10.10.10.50 to 13.107.6.152:443, here’s the canonical full trace, annotated.

diagnose debug reset
diagnose debug flow filter clear
diagnose debug flow filter saddr 10.10.10.50
diagnose debug flow filter daddr 13.107.6.152
diagnose debug flow filter dport 443
diagnose debug flow show function-name enable
diagnose debug flow show iprope enable
diagnose debug console timestamp enable
diagnose debug enable
diagnose debug flow trace start 100

Output (abbreviated):

received a packet(proto=6, 10.10.10.50:54321->13.107.6.152:443) from port3.   # ingress, Part 1/2
allocate a new session-00abcdef                                               # session miss, Part 2
in-[port3], out-[]
find SDWAN service: id=2, member id=2 (port5)                                 # SD-WAN match, Part 3
find a route: gw-198.51.100.1 via port5                                       # FIB resolve, Part 3
Allowed by Policy-12: SNAT                                                    # forward policy, Part 4
find SNAT: IP-203.0.113.20 (from IPPOOL), port-47331                          # SNAT decision, Part 4
DNAT 13.107.6.152:443->13.107.6.152:443                                       # no DNAT but logged
session install, policy=12, npu=enable, ips=enable                            # NPU offload, Part 5
enter fast path                                                               # NP7 takes over, Part 5

If anything goes wrong, the line immediately before the missing one is the diagnosis. find a route followed by no policy line means “routing succeeded but policy didn’t match.” iprope_in_check() check failed on policy 0 means “implicit deny.” reverse path check fail means “RPF dropped this.” Etc.

The full troubleshooting cookbook

Everything from this series, organised by what you’re investigating. Bookmark this section.

Is the packet on the wire?

diagnose sniffer packet <port> '<bpf>' 4 0 a
diagnose sniffer packet any '<bpf>' 6 0 l         # all interfaces, full hex, with iface

Is the driver / NIC seeing it?

diagnose hardware deviceinfo nic <port>
diagnose hardware sysinfo interrupts
diagnose hardware sysinfo memory
diagnose hardware sysinfo conserve
get hardware nic <port>

NPU / hardware path

diagnose npu np7 port-list
get hardware npu np7 port-list
get hardware npu np7 stats <np-id>
diagnose npu np7 anomaly-drop-counter <np-id>
diagnose npu np7 session-stats <np-id>
diagnose npu np7 dce
diagnose npu np7 ipsec-stats
diagnose npu np7 fastpath-sniffer
config system npu
    get
end

Sessions

diagnose sys session filter clear
diagnose sys session filter src <ip>
diagnose sys session filter dst <ip>
diagnose sys session filter sport <port>
diagnose sys session filter dport <port>
diagnose sys session filter proto <n>
diagnose sys session filter vd <id>
diagnose sys session list
diagnose sys session stat
diagnose sys session full-stat
diagnose sys session clear              # clears whatever filter selects

For IPv6, replace session with session6.

Policy match

diagnose firewall iprope lookup <src> <sport> <dst> <dport> <proto> <iif> <vdom>
diagnose firewall iprope list 0x100004
diagnose firewall iprope show 0x100004 <id>
diagnose firewall iprope-count show
show firewall policy <id>

Tables of interest:

  • 0x100002 — DNAT (VIPs)
  • 0x100003 — central SNAT
  • 0x100004 — forward policy
  • 0x100007 — implicit deny logging
  • 0x100009 — local-in policy
  • 0x100010 — local-out policy

NAT

diagnose firewall vip list
diagnose firewall vip realservers list
diagnose firewall vip realservers stats
diagnose firewall central-snat list
diagnose firewall ippool list
diagnose firewall ippool list <pool>
get firewall vip
get firewall central-snat-map

Routing

get router info routing-table all
get router info routing-table database
get router info routing-table details <ip>
get router info kernel
diagnose ip route list
diagnose ipv6 route list
diagnose firewall proute list
diagnose firewall proute6 list
get router info bgp summary
get router info bgp neighbors
get router info bgp network
get router info ospf neighbor
get router info ospf database
get router info bfd neighbor
diagnose ip rtcache list

SD-WAN

diagnose sys sdwan member
diagnose sys sdwan health-check
diagnose sys sdwan health-check status
diagnose sys sdwan service
diagnose sys sdwan service <id>
diagnose sys sdwan log
diagnose sys sdwan internet-service-app-ctrl-list
diagnose internet-service info
diagnose internet-service id <id>
diagnose internet-service match root <ip> <port> <proto>
get system sdwan service
get system sdwan member

UTM

# IPS
diagnose test application ipsmonitor 5
diagnose test application ipsmonitor 13
diagnose ips anomaly list
diagnose ips ssl conn

# Web filter / DNS filter
diagnose webfilter fortiguard statistics
diagnose dns query
diagnose test application urlfilter 5

# Application control
diagnose application list

# Proxy (WAD)
diagnose test application wad 1000
diagnose test application wad 2000
diagnose wad worker stat
diagnose wad debug enable category webfilter

# SSL inspection
diagnose firewall ssl-cert-cache list
diagnose firewall ssl-exempt list

VPN

diagnose vpn ike gateway list
diagnose vpn ike gateway flush
diagnose vpn tunnel list
diagnose vpn tunnel name <tun> stat
diagnose vpn ipsec status
diagnose vpn ipsec esp ...
get vpn ipsec tunnel summary
get vpn ipsec tunnel name <tun>

Authentication

diagnose firewall auth list
diagnose firewall auth filter
diagnose debug authd fsso list
diagnose debug authd fsso server-status

Live flow trace (the workhorse)

diagnose debug reset
diagnose debug flow filter clear
diagnose debug flow filter saddr <ip>
diagnose debug flow filter daddr <ip>
diagnose debug flow filter sport <port>
diagnose debug flow filter dport <port>
diagnose debug flow filter proto <n>
diagnose debug flow filter vd <id>
diagnose debug flow show function-name enable
diagnose debug flow show iprope enable
diagnose debug console timestamp enable
diagnose debug enable
diagnose debug flow trace start <count>
diagnose debug flow trace stop
diagnose debug disable
diagnose debug reset

CPU / memory health

get system performance status
diagnose sys top 2 30
diagnose sys top-mem
diagnose sys mpstat 1 5
diagnose hardware sysinfo conserve
diagnose hardware sysinfo memory

HA

diagnose sys ha status
diagnose sys ha checksum cluster
diagnose sys ha checksum show
get system ha status
diagnose sys session list | grep -i synced

Logging quickly during an incident

execute log filter category 0
execute log filter field srcip 10.10.10.50
execute log display
execute log filter reset

A discipline for “why is this packet failing?”

After a few hundred of these tickets, here’s the order I always work in. It saves time.

  1. Sniffer first. Is the packet on the wire? diagnose sniffer packet any 'host X and host Y' 4 0 a. If not, your problem is upstream of the FortiGate.
  2. Driver next. Is the NIC counting it? diagnose hardware deviceinfo nic <port>. If not, you have a physical or L2 problem.
  3. NPU anomaly counter. diagnose npu np7 anomaly-drop-counter. Catches silent drops at the silicon layer.
  4. Existing session? diagnose sys session filter ... ; list. Often the question “why is my new policy not matching” is “because there’s an existing session that was built before you added the policy.” diagnose sys session clear after filtering re-creates them.
  5. Flow trace. diagnose debug flow .... If anything in the kernel forwarding path is failing, the trace tells you.
  6. Policy lookup. diagnose firewall iprope lookup. Cross-check what the trace says.
  7. Routing / SD-WAN. get router info routing-table details, diagnose sys sdwan service. Confirm the egress decision.
  8. NAT / VIP. diagnose firewall vip list, diagnose firewall central-snat list, diagnose firewall ippool list.
  9. UTM. Only if the packet survives all of the above and still isn’t reaching its destination. UTM-related drops typically log to the IPS log, the AV log, or the web filter log — check execute log display against the relevant category.
  10. NPU offload state. Performance, not correctness. diagnose sys session list | grep no_ofld_reason.

Fin

That’s the journey. From the moment a frame arrives on a 10G/25G port and gets DMA’d into NP7 memory, through the on-chip session cache, the punt to the kernel, the gauntlet of stateful inspection / RPF / DoS / session lookup, the routing decision split across policy routes, SD-WAN service rules, and the FIB, the firewall policy match against the iprope chain, the NAT decisions in central or policy mode, the dispatch through flow-based or proxy-based UTM, the NPU offload re-evaluation, and finally the egress with whatever encapsulation the chosen path needs — every step is something you can see, query, and reason about with the right command.

The marketing page promises 50 Gbps. The reality is that 50 Gbps is what happens when most of these steps run in silicon and the kernel only sees the first packet of each session. Understanding which packets earn that privilege — and which don’t, and why — is the difference between a FortiGate that meets its datasheet and one that doesn’t.

If you’ve made it through all five parts of this series, you can now read a diag debug flow trace like a roadmap.