A Day in the Life of a Packet on a 50G FortiGate, Part 1: Ingress, NP7, and the Fast Path

A Day in the Life of a Packet on a 50G FortiGate, Part 1: Ingress, NP7, and the Fast Path

Series scope. This is a five-part deep dive. Part 1 (this post) covers ingress, hardware offload, and the NP7 fast path. Part 2 covers stateful inspection, session lookup, anti-spoofing, and DoS. Part 3 covers routing, policy routes, and SD-WAN service rules. Part 4 covers firewall policy match, NAT, and security profiles. Part 5 covers egress, NPU offload re-evaluation, and a complete troubleshooting cookbook.

Hardware target. I’m writing this against a 50 Gbps-class FortiGate — think 1800F / 2600F / 3000F / 3500F class — with one or more NP7 processors and a CP9 content processor. FortiOS 7.4.x / 7.6.x. Most of this generalises down to NP6 / SoC4-based units, but I’ll call out NP7-specific behaviour where it matters.

The marketing brochure says a 1800F does 50 Gbps of firewall throughput, 27 Gbps of IPS, and tens of millions of concurrent sessions. None of that happens in the x86 kernel. Most of it happens in silicon, on the NP7. Understanding which packets get to use the express lane and which packets are forced to crawl through the kernel is the single most useful thing you can know when you’re staring at diag debug flow output trying to work out why a TCP session won’t come up.

This series follows one packet from the moment its first bit clocks onto an SFP+ to the moment its last bit clocks out the egress port. Today: ingress.

The map

At a very rough level a FortiGate has three planes:

  1. Data plane — the NP7 ASICs (network processors), the CP9 (content processor for crypto, IPS pattern match, and SSL acceleration on some platforms), and the integrated switch fabric. This is where packets actually move.
  2. Control plane — the x86 cores running the FortiOS kernel and userland (miglogd, ipsmonitor, httpsd, cmdbsvr, the routing daemons, etc.). This is where decisions get made for new flows and where everything that can’t be expressed in NPU microcode happens.
  3. Management planecmdbsvr, the CLI, the GUI, FGFM to FortiManager, REST API, SNMP. Mostly irrelevant to a packet’s life unless that packet is destined for the FortiGate itself.

NP7s are connected to x86 over PCIe. The control plane builds session entries and pushes them down into the NP7 session cache. The NP7 enforces them in hardware. When something happens that the NP7 can’t enforce — a new session, a session that needs UTM, a packet with options it doesn’t understand — it punts the packet up to the kernel.

That hardware/software split is the whole game.

Stage 0: arrival on the wire

A frame arrives on a 10G or 25G port. The PHY recovers clock, the MAC strips the preamble and FCS, and the frame is DMA’d into a receive ring buffer that lives in NP7 memory (not host RAM, on NP7-class platforms — the NP7 has its own packet buffer). A descriptor is written: ingress port, ingress VLAN, length, hash.

If you ever want to confirm the interface is even seeing packets at the driver level — before any FortiOS logic — this is the layer you query:

diagnose hardware deviceinfo nic port1

Look for Rx_Packets, Rx_Bytes, Rx_Dropped, Rx_Errors, Rx_CRC_Errors, Rx_Frame_Errors. CRC errors mean an SFP, fibre, or upstream MAC problem — nothing FortiOS does will fix that. Rx_Dropped with no errors usually means buffer pressure (too many packets, not enough kernel CPU to drain), which is a very different conversation.

For NPU-attached interfaces:

diagnose npu np7 port-list
get hardware npu np7 port-list
get hardware npu np7 stats <np-id>
diagnose npu np7 anomaly-drop-counter <np-id>

anomaly-drop-counter is gold. It tells you which packets the ASIC dropped before they even got a chance to be examined: bad checksum, bad length, malformed L2, TTL=0, IP options it isn’t configured to handle, fragments it can’t reassemble, etc. Most “but the packet is being dropped and there’s nothing in the logs” mysteries on heavily-offloaded boxes live here.

diagnose hardware sysinfo interrupts

Tells you which CPU is servicing which NIC IRQ. On a busy 50G box you want IRQs spread across cores, not all stacked on CPU0. That’s almost never an issue on NP7 platforms because NPU traffic doesn’t generate per-packet host interrupts — but for non-NPU traffic and for control traffic it absolutely matters.

Stage 1: L2 normalisation

The frame is on the chip. The first thing the NP7 does is L2 normalisation: VLAN tag handling (single, QinQ), zone/VDOM resolution, MAC table lookup if you’re in transparent mode. In NAT/route mode this is mostly bookkeeping — the frame is going up to L3 in a moment.

Quick checks:

get system interface physical
get system interface
diagnose netlink interface list
diagnose hardware deviceinfo nic <port> | grep -i vlan

If you have software switch interfaces or hardware switch interfaces, behaviour differs sharply. Hardware switches keep traffic on the NP7’s integrated switch chip and never bother the kernel; software switches bridge in the kernel and cost CPU. On a 1800F-class box you almost always want hardware switch.

Stage 2: the NP7 session lookup — the express lane

This is the most important moment in the packet’s life.

The NP7 takes the 5-tuple (or 7-tuple, including ingress port and VRF) of the packet and hashes it against its on-chip session cache. Every fully-established session that is NPU-eligible has an entry there. The entry tells the NP7:

  • Which output port to send this on
  • Which output VLAN tag, if any
  • What SNAT or DNAT translation to apply
  • What QoS / DSCP marking to apply
  • Whether this session should be IPSec-encapsulated, and which SA to use
  • Whether this packet should be mirrored, sampled, or sent up for sampling

If there’s a hit, the NP7 rewrites the packet, encrypts it if it needs to (offloaded to the on-chip crypto engine), and clocks it out the egress port. The kernel never sees the packet. This is how you get 50 Gbps of stateful firewalling out of a box whose x86 cores collectively could not move 50 Gbps if their lives depended on it.

To see what’s actually offloaded:

diagnose sys session list | grep -A2 npu
diagnose sys session stat
diagnose npu np7 session-stats <np-id>
get hardware npu np7 stats <np-id>

In the diagnose sys session list output the lines you care about per session are:

npu_state=0x4000 / 0x100 / 0x0
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=160/162, ipid=160/162, vlan=0x0000/0x0000, vlifid=0/0, vtag_in=0x0000/0x0000 vtag_out=0x0000/0x0000
no_ofld_reason: ...

offload=8/8 means both directions are offloaded. offload=0/0 with a no_ofld_reason is the diagnosis you want: it tells you why this session is stuck on the slow path. Common reasons:

  • non-npu-intf — one of the interfaces is on a non-NPU port (typical on small-platform mgmt ports, or a software switch).
  • disabled-by-policyauto-asic-offload disable set on the policy.
  • helper — there’s an ALG helper attached (FTP, SIP, PPTP, RSH, etc.) and the helper hasn’t yet finished its dance.
  • redir-to-av / redir-to-ips — proxy-mode UTM has the session.
  • local-traffic — packet is destined for the FortiGate itself.
  • ha-config — HA session sync state isn’t ready.
  • unsupported-protocol, frag, multicast, not-established — exactly what they say.
  • ips-flowfilter — flow-based IPS with NTurbo can still offload, but if NTurbo is off this is the reason.

That single field is the key to “why is my session not getting hardware accelerated.”

Stage 3: NTurbo and IPSA — the half-fast path

What if the session is allowed but has flow-based UTM (IPS, application control, web filter)? The NP7 can’t make IPS decisions — that needs the IPS engine on the CPU and the CP9. So is the session forced to the slow path?

No. This is what NTurbo and IPSA exist for.

NTurbo (NP7 Turbo) is a VLAN-tag-based shortcut. The NP7 still touches every packet of the session, but instead of bouncing through the kernel networking stack and then back to the NP7 for transmit, the kernel hands the packet to the IPS engine directly via a virtual interface, and once IPS clears it the packet is handed back to the NP7 for hardware transmit. You skip large chunks of the kernel forwarding path.

IPSA (IPS Acceleration) is the IPS engine offloading pattern matching to the CP9 / on-chip pattern match engine. Where supported, signature scanning happens in silicon.

Together they let a 1800F push tens of Gbps of IPS-inspected traffic. Without them the box is a kernel-bound x86 firewall doing pattern match in software, and 27 Gbps of IPS is fantasy.

To inspect:

diagnose test application ipsmonitor 5      # IPS engine status
diagnose test application ipsmonitor 13     # NTurbo info
diagnose ips anomaly list
get system npu                                # check fastpath, ipsec-host-dec-subengine, etc.
config system npu
    get
end

The system npu block has dozens of knobs that affect this — fastpath, enable-action-fastpath, intf-shaping-offload, iph-rsvd-re-cksum, np-queues, etc. Default values are sensible on every model I’ve ever shipped; touch them deliberately.

Stage 4: the punt

If the NP7 can’t process the packet — first packet of a new session, no session match, helper required, proxy UTM, fragmented packet, packet for the FortiGate itself, control protocol the NP7 doesn’t understand (OSPF, BGP, BFD, IS-IS, IKE) — it punts. The packet is DMA’d over PCIe from the NP7 to host memory, an interrupt fires, the kernel’s NPU driver picks it up, and it enters the FortiOS forwarding path as if it had arrived on a software-only NIC.

From this point on, until the kernel either drops the packet or hands it back to the NP7 for transmit, every microsecond costs CPU.

This is also where diag debug flow becomes useful, because everything diag debug flow shows you is happening on the slow path. If a session is fully NP7-offloaded, diag debug flow will show you the first packet (the one that built the session) and then go silent. That’s not a bug; that’s success.

A first-packet trace

Here’s the canonical recipe for watching a first packet take its trip through the kernel. Filter as tightly as you can — on a busy box even a /32 + port filter can be noisy.

diagnose debug reset
diagnose debug flow filter clear
diagnose debug flow filter saddr 10.1.1.10
diagnose debug flow filter daddr 8.8.8.8
diagnose debug flow filter proto 6
diagnose debug flow filter port 443
diagnose debug flow show function-name enable
diagnose debug flow show iprope enable
diagnose debug console timestamp enable
diagnose debug enable
diagnose debug flow trace start 50

To stop:

diagnose debug flow trace stop
diagnose debug disable
diagnose debug reset

The output will start with something like:

id=20085 trace_id=1 func=print_pkt_detail line=5824 msg="vd-root:0 received a packet(proto=6, 10.1.1.10:54321->8.8.8.8:443) tun_id=0.0.0.0 from port3."
id=20085 trace_id=1 func=init_ip_session_common line=6005 msg="allocate a new session-00abcdef, tun_id=0.0.0.0"

That second line — allocate a new session — is the slow path saying “I have not seen this 5-tuple before, I’m building state.” Everything after it is the kernel making the forwarding decision: route lookup, SD-WAN service rule match, policy match, NAT decision, UTM hand-off. That’s Parts 2, 3, and 4 of this series.

Things to confirm before you ever leave the ingress stage

When you’re triaging a packet that “isn’t arriving” or “isn’t being forwarded,” before you go anywhere near policy or routing, walk these:

# Is the packet on the wire at all?
diagnose sniffer packet port3 'host 10.1.1.10 and port 443' 4 0 a

# Is the driver counting it?
diagnose hardware deviceinfo nic port3

# Is it being dropped at the ASIC for an L2/L3 anomaly?
diagnose npu np7 anomaly-drop-counter <np-id>

# Is there an existing offloaded session that's eating it silently?
diagnose sys session filter src 10.1.1.10
diagnose sys session filter dport 443
diagnose sys session list

# Is conserve mode active and dropping new sessions?
diagnose hardware sysinfo conserve
diagnose sys session stat

diagnose sniffer packet is the most underrated tool on the box. Verbosity levels: 1 = headers, 2 = headers + first 64 bytes, 3 = full hex, 4 = headers + interface, 5 = headers + interface + full hex, 6 = full hex with interface info. The trailing 0 is “capture forever,” and a enables absolute timestamps. If sniffer packet shows the frame leaving an interface in one direction but never coming back, your problem is downstream of the FortiGate, not on it.

If the sniffer sees the packet but diag debug flow doesn’t fire, you have either a filter mismatch or — much more interestingly — a fully offloaded session, and the NP7 is silently doing the right thing.

Where we are

The packet is on the box. The NP7 has decided what to do with it. Either it’s already gone (offloaded session, hardware forwarded, never bothered the kernel — and Parts 2 through 5 of this series will only see this packet as a session-create event), or it’s been punted up to the kernel for a forwarding decision.

In Part 2 we pick up the packet on the kernel side and follow it through stateful inspection: anti-spoofing, IP integrity, the DoS sensor, session table lookup, helpers, and the moment the kernel either creates a new session or says “no, you can’t be here.”