SSH Hardening Beyond the Basics: Certificate Authorities, Bastion Patterns, and Session Auditing

What “SSH hardening” usually means, and why it is not enough

Almost every SSH-hardening guide on the internet stops at the same checklist: disable password authentication, disable root login, change the port, enforce key authentication, fail2ban, done. That is fine for a personal server. It does not survive contact with a real estate of dozens or hundreds of hosts, dozens of engineers, and an audit team that wants to know who did what on which box at which time.

The problem at scale is the keys themselves. Public keys live in ~/.ssh/authorized_keys files distributed across every host, often via configuration management. Adding a new engineer means a config push to every host. Removing one means another push, and a real audit of every key file that might still have their old key in it. Compromised laptops mean an emergency rotation of one user’s key across the entire estate. None of this scales, and none of it produces the audit trail anyone actually wants.

The fix is to stop using static keys altogether and run your own SSH certificate authority. This post walks through the model, the day-to-day operational mechanics, the bastion topology that goes with it, the ForceCommand patterns for restricted access, and the auditing setup that turns SSH sessions into a searchable record. It is aimed at engineers who are already comfortable with public-key SSH and want to operate it the way someone running a real production estate operates it.

How SSH certificates actually work

OpenSSH has supported certificates since version 5.4. The model is the same one used by TLS — you have a CA whose public key is trusted, the CA signs short-lived certificates for users (and hosts), and the things presenting those certificates are trusted because the CA signed them, not because their individual keys are listed somewhere.

There are two CAs in a complete deployment: a user CA that signs user certificates, and a host CA that signs host certificates. They can be the same key in small setups; they should be different keys in any serious setup, because their failure modes are different.

A user certificate binds a public key to:

  • A list of valid principals (usually usernames the cert is allowed to log in as)
  • A validity window (-V +1h is typical; never issue certificates that last more than a working day)
  • Critical options (force-command, source-address)
  • Extensions (whether agent forwarding, port forwarding, X11, or PTY allocation are permitted)

A host certificate binds a host’s public key to a list of hostnames. Once the host CA’s public key is trusted in ~/.ssh/known_hosts, every host with a valid certificate is trusted automatically — no more “the authenticity of host X cannot be established” trust-on-first-use prompts forever.

Generating the CAs

Two key pairs. Keep both offline. The user CA in particular is the keys to your kingdom, and it should never live on a network-attached host that is not exclusively used for issuing certificates.

# User CA
ssh-keygen -t ed25519 -f ~/ca/user_ca -C "user-ca $(date +%Y-%m-%d)"

# Host CA
ssh-keygen -t ed25519 -f ~/ca/host_ca -C "host-ca $(date +%Y-%m-%d)"

ed25519 is the right algorithm for new builds. RSA-4096 is fine if you have a reason. ECDSA with NIST curves is acceptable but not preferred.

Treat both private keys like root keys. In a serious deployment they live on a hardware security module or a YubiKey configured as a smart card; for a small estate, an air-gapped laptop in a safe is fine. The threat model is “someone with this key can log into every host as anyone they want, until I rotate the CA”. Plan accordingly.

Trusting the CAs on hosts

On every host, configure sshd to trust the user CA’s public key for authenticating users:

sudo cp user_ca.pub /etc/ssh/user_ca.pub
sudo chmod 644 /etc/ssh/user_ca.pub

Add to /etc/ssh/sshd_config:

TrustedUserCAKeys /etc/ssh/user_ca.pub

Now sshd will accept any user certificate signed by that CA, provided the certificate’s principal list includes the username being logged into.

For host certificates, sign the host’s existing host key:

# On the CA box, with the host's public key copied over
ssh-keygen -s host_ca -I "$(hostname).$(date +%s)" -h \
           -n "edge1.example.com,edge1,10.0.0.5" \
           -V +52w \
           edge1_host_ed25519_key.pub

Copy the resulting edge1_host_ed25519_key-cert.pub back to the host and add to /etc/ssh/sshd_config:

HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

Restart sshd. The host now presents the certificate during the handshake. On client machines, add a single line to ~/.ssh/known_hosts (or, more usefully, push it via configuration management):

@cert-authority *.example.com ssh-ed25519 AAAA... host-ca

That single line replaces every static known_hosts entry for every host in *.example.com. New hosts that are issued certificates by the same CA are trusted automatically.

Issuing short-lived user certificates

The user-side workflow looks like this. The user generates a key pair on their laptop. They authenticate to a small certificate-issuing service (typically over SSO, sometimes via a CLI tool that talks to your IdP, sometimes via Vault’s SSH secrets engine). The service signs their public key with the user CA and returns the resulting certificate. The certificate goes into their SSH agent. SSH connections to any host trusted by the CA succeed without anything else being configured.

Manual issuance, the way you do it for the first time and the way you keep working when the issuance service is broken:

ssh-keygen -s user_ca \
           -I "[email protected] $(date +%s)" \
           -n "alice,deploy" \
           -V +1h \
           -O clear \
           -O permit-pty \
           alice_id_ed25519.pub

Read carefully:

  • -I is the certificate identity. This is what shows up in the host’s auth logs. Use a stable, searchable string — username plus issue timestamp.
  • -n is the principal list. Comma-separated usernames the cert is allowed to log in as. alice for shell access, deploy for the deploy account on a CI box.
  • -V +1h is the validity. Never issue long-lived user certificates. The whole point is that revocation becomes “wait an hour”.
  • -O clear strips the default extensions (which permit X11, agent forwarding, port forwarding, and PTY). -O permit-pty adds back the one extension you almost always want. Add -O permit-port-forwarding selectively, never to everyone.

The certificate goes back to the user’s machine. They load it:

ssh-add alice_id_ed25519
ssh-add -L | head -1   # confirms cert is loaded
ssh edge1.example.com  # works without any additional config

When the certificate expires in an hour, they re-issue. If the issuance service requires SSO and SSO requires a hardware token, you have multi-factor authentication for SSH without ever having installed a PAM module on a host.

Bastion topology with ProxyJump

The CA layer makes bastions cleaner. The pattern is: only the bastion host is reachable from the corporate network or VPN. All other hosts only accept SSH from the bastion’s network. Engineers ssh to the bastion, and from the bastion they ssh onward to wherever they actually want to be.

In the modern OpenSSH client this is ProxyJump. The user-side config:

# ~/.ssh/config
Host bastion
    HostName bastion.example.com
    User alice

Host *.example.com !bastion
    User alice
    ProxyJump bastion

Now ssh edge1.example.com transparently jumps through the bastion. The session establishes a TCP connection from the laptop to the bastion, the bastion opens a connection from there to edge1, and the SSH session terminates end-to-end on edge1. The bastion never sees the user’s plaintext.

The reason this is dramatically better with certificates is that the bastion does not need to hold any keys for downstream hosts. The user’s certificate, presented at the bastion, is forwarded via agent forwarding to be re-presented to edge1. Disable agent forwarding on the bastion if you do not want this; or, better, allow it but monitor for misuse.

ssh -J bastion edge1 is the equivalent inline syntax.

The bastion’s own configuration tightens access:

# /etc/ssh/sshd_config on the bastion
TrustedUserCAKeys /etc/ssh/user_ca.pub
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
AllowAgentForwarding yes
AllowTcpForwarding yes
PermitTunnel no
X11Forwarding no
ClientAliveInterval 300
ClientAliveCountMax 0

ClientAliveCountMax 0 is the trick: if the client does not respond to a single keepalive, the session terminates. This prevents idle sessions from sitting open indefinitely.

ForceCommand for restricted access

ForceCommand overrides whatever the user tries to run and replaces it with a fixed command. It is the right primitive for restricted accounts that should only ever do one thing — for example, a backup user that should only ever invoke rsync, or a deploy user that should only ever invoke git-receive-pack.

Match User backup
    ForceCommand /usr/local/sbin/restricted-rsync
    PermitTTY no
    AllowTcpForwarding no
    X11Forwarding no
    PermitUserRC no

/usr/local/sbin/restricted-rsync then validates the original command (which is in the SSH_ORIGINAL_COMMAND environment variable) and either runs it or rejects it. Existing toolkits like rrsync, git-shell, and borg-serve follow this pattern.

The combination of “user certificate with force-command extension” and “host-side ForceCommand” is robust. Certificate-side force-command is the floor; host-side ForceCommand is the ceiling. The strictest of the two wins.

Authorised keys from a command

AuthorizedKeysCommand lets sshd ask an external program for the authorised keys at login time. With certificates this is mostly unnecessary — the CA model removes the need for distributed key files — but it is still useful when you want to integrate with an existing identity provider that has not yet given you certificate issuance.

AuthorizedKeysCommand /usr/local/sbin/fetch-authorized-keys %u
AuthorizedKeysCommandUser nobody

The script runs as nobody and prints public keys on stdout. Common patterns: query your IAM/IdP, query a small in-house service that knows about role-based key bundles, query Vault’s SSH client signing endpoint and return the cert.

Whichever approach you use, never have static authorized_keys files in user home directories on production hosts. They are unmanaged state. Either move to certificates entirely, or fetch keys at login time so the source of truth is somewhere you can audit.

Session recording

The thing every audit team eventually asks for is a recording of what was actually typed in a session. OpenSSH itself does not record sessions; you bolt this on.

The standard tool is tlog. It is a wrapper that records the user’s terminal session — input and output — to a structured log that can be replayed.

sudo dnf install tlog   # or apt on Debian-family

Configure it as the user’s shell, or better, as a ForceCommand for accounts you want to record:

Match User contractor
    ForceCommand /usr/bin/tlog-rec-session

tlog-rec-session execs the user’s real shell underneath, but every keystroke and every byte of terminal output is captured. By default the recordings go to the systemd journal as structured JSON, where they can be shipped to your central logging by the same pipe that ships everything else. Replay with tlog-play:

journalctl _COMM=tlog-rec-session -o json | tlog-play

auditd is the other half of the picture. Where tlog records what the user saw, auditd records what the system did — every file opened, every syscall executed, every command run. Configure it to log SSH login/logout events with full context, plus any sensitive file access:

# /etc/audit/rules.d/ssh.rules
-w /var/log/wtmp -p wa -k logins
-w /var/log/auth.log -p wa -k auth
-w /etc/ssh/sshd_config -p wa -k sshd_config
-w /etc/passwd -p wa -k user_modification
-w /etc/shadow -p wa -k user_modification
-a always,exit -F arch=b64 -S execve -k commands

The execve rule is heavy. It logs every command executed on the box. It is also the rule auditors actually want, and combined with tlog and SSH cert identities you get a full chain of “who connected, when, from where, what they did, what changed”.

Ship audit logs and journal entries off-box in real time. The threat model includes a successful intruder who can clear local logs.

Operational tail end

A few things that matter day-to-day.

Revocation. Short-lived certs are the primary revocation mechanism — wait it out. For mid-session revocation (an active engineer’s laptop has just been stolen) generate a Key Revocation List with ssh-keygen -k and distribute it to every host. sshd reads it via RevokedKeys /etc/ssh/krl.

CA key custody. The user CA private key is the most valuable secret in your estate. Keep it offline. If you need automated issuance, run a thin issuance service that holds the CA key and exposes a sign-by-policy API; the policy enforces principal lists, validity windows, and source authentication. HashiCorp Vault’s SSH secrets engine is a reasonable starting point.

Rotation. Plan a CA rotation every couple of years. The mechanics are: stand up a new CA, distribute its public key to every host’s TrustedUserCAKeys and known_hosts, switch issuance to the new CA, expire the old one, redistribute. Rehearse this in a staging environment before you need it.

Logging hostname stability. SSH cert principals reference usernames, not hostnames. Make sure your username scheme is stable across hosts — alice should mean the same human on every box. Mismatched usernames are the most common reason a cert that “should work” does not.

SSH config templating. Keep your ~/.ssh/config under version control or generate it from a known source. With certificates, the per-host config gets dramatically simpler — usually a single wildcard Host *.example.com block — but the bastion configuration and ProxyJump patterns are still worth having tracked.

What this gets you

A new engineer joins. They go through SSO, the issuance service signs their public key for the right principals based on their group memberships, and they have access to everything they should have access to without anyone editing a single file on any host. They leave; the next time they try to issue a certificate, the issuance service refuses, and any cert they had expires within an hour. Their access ends, automatically, with no manual cleanup. Every session they ever had is recorded in tlog and indexed in your central logs. The whole estate reaches a point where SSH access is a managed, auditable, rotatable thing rather than a federation of authorized_keys files that nobody is brave enough to clean up.

That is what SSH hardening looks like once you stop thinking in terms of static keys. The five posts in this series — tcpdump, NETEM, namespace labs, nftables migration, and certificate-based SSH — are the toolkit I would want a Linux-savvy network engineer to have on their first day at a new place.