iptables to nftables: Migrating Production Firewalls Without Downtime

Why this is finally worth doing

iptables is in maintenance mode. The kernel team has been clear that nftables is the strategic direction, distributions ship the iptables binaries as compatibility wrappers around the nftables backend, and every nontrivial new networking feature added to Netfilter for the last several years has landed in nftables first. None of this is news. What is new is that the feature surface and tooling around nftables have matured enough that there is no longer a credible reason to be running native iptables on a new build.

The migration that has been hanging over a lot of production estates is finally cheap to do well. This post walks through how to do it without taking the firewall down, how to read the output of iptables-translate correctly, the atomic ruleset swap that makes a botched migration recoverable in seconds, and the gotchas that bit me on real systems.

This is for engineers who are already comfortable with iptables — chains, tables, jumps, MARK, NAT, conntrack, ipset — and want to translate that comfort into nftables without relearning everything from the manuals.

The mental model shift

The single biggest change is that nftables is one engine with one ruleset, where iptables was a federation of separate tables (filter, nat, mangle, raw, security) that you operated on with separate binaries. In nftables there is no fixed set of tables. You create whatever tables you need, attach chains to them, and the chains hook into the same Netfilter hooks (prerouting, input, forward, output, postrouting) that iptables has always used.

That has consequences. In iptables, a packet always traversed raw → mangle → nat → filter in a fixed order. In nftables, that ordering is determined by the priority value you give each chain. Lower numbers run first. The standard iptables priorities have well-known integer values (-300 for raw, -150 for mangle prerouting, -100 for nat prerouting, 0 for filter, and so on), and nftables gives you symbolic names for them (raw, mangle, dstnat, filter, srcnat) that map onto those same integers. If you put a filter chain at priority -200, you are running before any nat takes place; if you put a nat chain at priority 100, it runs after filter. iptables hid this from you. nftables lets you control it.

The second mental shift is that a nftables ruleset is a single document. You can dump it, edit it, and reload it atomically. There is no equivalent in classic iptables — iptables-save and iptables-restore come close, but they only cover one table at a time and they are not strictly atomic across tables. nftables gets this right.

The third shift is sets and maps as first-class objects. Where iptables farmed this out to ipset, nftables has named sets and verdict maps built in, and you can mutate them at runtime without rewriting the chains that reference them. This is the killer feature for anyone running a production firewall.

iptables-translate is a starting point, not a finish line

Every distribution ships iptables-translate, which converts a single iptables command into the nftables equivalent. It is genuinely useful for getting a feel for the syntax and for translating obvious rules.

$ iptables-translate -A INPUT -p tcp --dport 22 -j ACCEPT
nft 'add rule ip filter INPUT tcp dport 22 counter accept'

$ iptables-translate -t nat -A POSTROUTING -o eth0 -j MASQUERADE
nft 'add rule ip nat POSTROUTING oifname "eth0" counter masquerade'

It is also misleading in ways that will burn you if you trust it blindly:

  • It produces native iptables-table-name output (filter, nat, mangle) rather than building your tables idiomatically. A production nftables ruleset usually wants a single inet table covering both IPv4 and IPv6, not separate ip and ip6 tables.
  • It does not deduplicate. If you translate a thousand-line iptables ruleset rule by rule, you will get a thousand-line nftables ruleset that does not exploit any of the things nftables can do better — sets, maps, and concatenations.
  • It does not handle ipset cleanly. iptables-translate will refuse to convert ipset matches; you have to translate those by hand to nftables sets.
  • It silently strips counters. If you rely on long-running rule counters for monitoring, those are gone the moment you load the translated ruleset.

Use it as a Rosetta stone for syntax. Do not use it as a rewriting tool.

Walking through a real migration

Take a representative iptables ruleset for an edge box: NAT for an internal network, an INPUT chain that allows SSH from a management subnet and drops everything else, a FORWARD chain that filters east-west traffic, and an ipset of known-bad IPs that get dropped early. The native iptables version looks something like:

*nat
:PREROUTING ACCEPT
:POSTROUTING ACCEPT
-A POSTROUTING -o eth0 -s 10.0.0.0/24 -j MASQUERADE
COMMIT

*filter
:INPUT DROP
:FORWARD DROP
:OUTPUT ACCEPT
-A INPUT -m set --match-set badguys src -j DROP
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -s 192.168.100.0/24 -p tcp --dport 22 -j ACCEPT
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth1 -o eth0 -j ACCEPT
COMMIT

The idiomatic nftables equivalent uses a single inet table, a named set in place of the ipset, and named verdict maps where it improves clarity:

table inet edge {
    set badguys {
        type ipv4_addr
        flags interval
        elements = {
            192.0.2.0/24,
            198.51.100.7
        }
    }

    chain input {
        type filter hook input priority filter; policy drop;

        ct state vmap { invalid : drop, established : accept, related : accept }
        iifname "lo" accept
        ip saddr @badguys drop
        meta l4proto icmp accept
        meta l4proto icmpv6 accept
        ip saddr 192.168.100.0/24 tcp dport 22 accept
    }

    chain forward {
        type filter hook forward priority filter; policy drop;

        ct state vmap { invalid : drop, established : accept, related : accept }
        iifname "eth1" oifname "eth0" accept
    }

    chain postrouting {
        type nat hook postrouting priority srcnat; policy accept;

        ip saddr 10.0.0.0/24 oifname "eth0" masquerade
    }
}

Things worth noticing about the rewrite:

  • One table, inet, covers both IPv4 and IPv6. The original iptables ruleset had no IPv6 protection at all. The nftables version does not yet either, but adding it is now a one-line change for ICMPv6 instead of writing a parallel ip6tables ruleset.
  • The connection-tracking state lookup is a verdict map, which is faster and clearer than three separate rules.
  • The badguys set is a named nftables set with interval flag, which means CIDR ranges work directly. No ipset.
  • The chain priorities are symbolic (filter, srcnat) rather than magic numbers.
  • The set is mutable at runtime: nft add element inet edge badguys { 203.0.113.5 } adds an entry without reloading anything.

Atomic ruleset swap

The reason this migration is safe is that nftables loads a whole ruleset transactionally. Either every rule loads and the new ruleset is in effect, or none of it loads and the old ruleset stays. There is no window during which the box is partially configured.

The pattern:

# Save the current state for rollback
sudo nft list ruleset > /etc/nftables/pre-migration.nft

# Stage the new ruleset in a file
sudo cp new-ruleset.nft /etc/nftables/edge.nft

# Validate without loading
sudo nft -c -f /etc/nftables/edge.nft

# Load atomically
sudo nft -f /etc/nftables/edge.nft

nft -c does a parse and validation pass without touching the kernel state. If your ruleset is syntactically wrong or refers to something that does not exist, -c catches it. Get into the habit of running -c in CI before any change is allowed to land on a real box.

If the new ruleset is bad in a way that only shows up under traffic — a typo in an interface name that drops what you meant to allow — you can roll back instantly:

# Drop the new ruleset entirely and restore
sudo nft flush ruleset
sudo nft -f /etc/nftables/pre-migration.nft

flush ruleset clears every table. Reload from the saved file. You are back where you were within a second. This is the safety net that makes the migration cheap.

Living with iptables-nft for a while

Modern distributions ship iptables as iptables-nft — a wrapper that speaks iptables CLI but writes into nftables tables under the hood. You can confirm with:

sudo update-alternatives --display iptables
iptables --version
# iptables v1.8.x (nf_tables)

This is actually useful during migration. It means you can run native nftables alongside legacy iptables-format rules and have them coexist, because both are ultimately writing into the same backend. The catch is that they write into separate tables (mangle, nat, filter for iptables; whatever you named for native nftables), so you have to think about chain priorities to ensure they do not collide.

The pragmatic order on a complex production box is:

  1. Convert one iptables-format table at a time, native to nftables, validating with -c and watching counters.
  2. Keep iptables-nft installed throughout the migration so that emergency tooling and orchestration that still calls iptables keeps working.
  3. Once everything is native nftables, remove the iptables-format rules entirely and stop calling iptables from automation.
  4. Eventually remove the iptables compatibility binaries.

Skipping straight to step 4 is where people get hurt. Plenty of monitoring agents, Docker, fail2ban, and orchestration tools still call iptables directly, and they all keep working because of the compatibility shim. Take that away too early and you start fixing those tools instead of doing the migration you wanted to do.

Sets, maps, and concatenations

The reason you actually want to do this migration, not just out of compliance with kernel direction, is that nftables sets and maps make rulesets that used to be unmanageable into rulesets that are obvious.

A common iptables pattern: allow SSH from a list of source networks. The iptables version is N rules, one per source. The nftables version is one rule and a set:

set ssh_allowed {
    type ipv4_addr
    flags interval
    elements = { 192.168.100.0/24, 10.0.0.0/8, 198.51.100.0/27 }
}

chain input {
    ...
    ip saddr @ssh_allowed tcp dport 22 accept
}

A more interesting pattern: per-source rate limiting. In iptables this required hashlimit and a great deal of squinting. In nftables:

set ssh_meter {
    type ipv4_addr
    size 65535
    flags dynamic
    timeout 1m
}

chain input {
    ...
    tcp dport 22 ct state new add @ssh_meter { ip saddr limit rate 5/minute } accept
    tcp dport 22 ct state new drop
}

Verdict maps replace long if/else chains. To route different destination ports to different chains:

chain input {
    ...
    tcp dport vmap {
        22 : jump ssh_in,
        80 : jump web_in,
        443 : jump web_in,
        25 : jump mail_in
    }
}

This is faster and more readable than the equivalent ladder of iptables rules.

Concatenations let you build keys from multiple fields:

set blocked_pairs {
    type ipv4_addr . inet_service
    elements = {
        192.0.2.5 . 22,
        198.51.100.10 . 3389
    }
}

chain input {
    ip saddr . tcp dport @blocked_pairs drop
}

This block-by-pair construct simply did not exist in iptables without a ladder of rules.

Logging, counters, and observability

counter is no longer free. In iptables, every rule had counters. In nftables you opt in:

ip saddr 10.0.0.5 counter drop

In production, put counter on the rules you actually monitor. Putting it on every rule is fine on small firewalls and starts to cost on very high packet rates.

Logging is similar. The log statement supports a prefix and a level:

ip saddr @badguys log prefix "fw drop badguys: " level warn drop

The output goes to the kernel ring buffer, the same place iptables logged to. Same caveats apply: log selectively, never log every drop on a public-facing interface.

For monitoring, the cleanest thing is to give each chain or set a stable name and have your collector pull nft -j list ruleset periodically. JSON output is structured and easy to parse, which is a meaningful improvement over scraping iptables-save text.

Gotchas worth remembering

A few things that have caught me out.

  • interval flag on a set is not the default. Without it, sets only match exact addresses, not CIDR ranges. If your set looks correct but matches nothing, the flag is missing.
  • Chain priorities matter when you mix tables. Two chains hooked into input at the same priority do not have a guaranteed order — give them explicit, distinct priorities.
  • nft flush table drops the table contents but leaves the table object. nft delete table removes the table itself. They are not the same.
  • The compatibility wrapper translates iptables rules into nftables tables named filter, nat, etc. If you are inspecting a production box that has both legacy and native rules, you will see both sets in nft list ruleset output, which is correct but startling.
  • Restart of a network service can clear connection-tracking state. Long-lived flows that were established before a state-table flush will be marked invalid. Schedule conntrack-affecting changes for low-traffic windows.
  • Docker installs its own iptables rules and expects them to be there. On a Docker host, do not blow away the iptables-format rules without understanding what Docker is doing — either let it manage its own rules and write your native nftables rules around them, or move to a fully native setup with explicit Docker integration.

What this buys you

A migrated nftables firewall is faster, atomically reloadable, has unified IPv4/IPv6 handling, has built-in dynamic sets so blocklist updates do not require a reload, and produces structured output for monitoring. The cost is a few hours of careful work per box and a willingness to learn one ruleset language properly.

The next post wraps the series with SSH hardening — short-lived certificates, bastion patterns, and session auditing — which is the natural follow-up to “we have made the box’s firewall sensible”.