Comparing Route Tables Between Two Sources: A Small Python Tool for Audits and Migrations

29 April 2026 python networking automation ipaddress openpyxl audit routing

Source on GitHub: MichealGarner/route-compare — clone the repo for the script, example fixtures, and the latest version.

The problem

There is a recurring task in network operations that has nothing to do with packets and everything to do with spreadsheets. Two teams hand you two route lists, in two slightly different formats, and ask whether they agree. Or you are migrating from one platform to another and need to know which prefixes the new platform is missing. Or you are auditing a route table against a documented set of advertisements and need to flag anything extra, anything missing, and anything that nearly-but-not-quite matches.

The “nearly-but-not-quite” case is the interesting one. 10.20.0.0/16 and 10.20.5.0/24 are not the same prefix, but they overlap, and that overlap usually matters — either someone has summarised, or someone has leaked a more-specific, or someone has made a mistake. A naive set(A) ^ set(B) will not tell you that. You need actual prefix arithmetic.

This post walks through a small tool that handles all of it. Two Excel files in, one Excel report out (plus matching CSVs), with exact matches, one-sided entries, overlaps, and invalid entries each on their own sheet. It is around 350 lines of Python and uses only the standard library plus pandas and openpyxl. The full source is at the end of this post, and lives at github.com/MichealGarner/route-compare with example input fixtures.

What the tool does

The flow is straightforward. Read both Excel files. Pull the prefix column out of each. Run every cell through a normaliser that uses Python’s ipaddress module to validate and canonicalise — 10.0.0.0/24, 10.0.0.0 /24, and 10.0.0.0/24 all become 10.0.0.0/24. Anything that cannot be parsed as a valid IP or prefix gets parked on an “invalid” sheet, with the original cell value preserved so you can go back and fix the source. Bare addresses without a mask are treated as host routes (/32 for IPv4, /128 for IPv6).

Once both sides are normalised, the comparison is set arithmetic plus overlap detection. For every canonical prefix in the union of A and B, the tool answers four questions: is it in A, is it in B, does it overlap anything on the other side, and what is its overall status. The status is one of five values — exact match in both, only in A, only in B, only in A but overlaps something in B, only in B but overlaps something in A. The output workbook colour-codes these green, red, red, amber, amber so the eye lands on the rows that matter.

The overlap question is what makes the tool worth writing. The ipaddress module has network.overlaps(other) built in, and it handles all the interesting cases — equal networks, one network containing the other, one network being a subnet of the other, and partial overlaps where neither contains the other. The tool exposes that as a separate “Overlaps_pairs” sheet that lists every (A, B) pair that overlaps, with a relation column saying whether the relationship is EQUAL, A_CONTAINS_B, A_WITHIN_B, or PARTIAL_OVERLAP. That is the sheet you take to a routing meeting.

Why `ipaddress` is the right tool

It is tempting to parse prefixes with regex. Don’t. The standard-library ipaddress module is exhaustively battle-tested, handles IPv4 and IPv6 uniformly, validates host-part bits, canonicalises notation, and exposes set-style operations on networks. The job of regex in this tool is only to extract candidate tokens from messy cells — strings like "see prefix 10.0.0.0/24 (legacy)" where the prefix is buried in commentary. Once a candidate is pulled out, ipaddress.ip_network(token, strict=False) is the authority on whether it is real.

The strict=False flag matters. It accepts 10.0.0.5/24 and silently turns it into 10.0.0.0/24. Strict mode would reject anything where the host bits are non-zero. For a comparison tool that is reading data out of someone’s Excel sheet, lenient parsing with canonicalisation is the right default — it is exactly the behaviour a human would apply mentally.

Install

The tool needs Python 3.10 or newer (it uses lowercase generics like list[str]) and two third-party packages: pandas for reading the spreadsheets, and openpyxl for writing the formatted output. Everything else — ipaddress, argparse, logging, re, dataclasses, enum — is standard library.

The clean way to install:

# 1. Clone the repo
git clone https://github.com/MichealGarner/route-compare.git
cd route-compare

# 2. Create and activate a venv
python3 -m venv .venv
source .venv/bin/activate          # Linux / macOS
# .venv\Scripts\activate           # Windows PowerShell

# 3. Install dependencies
pip install -r requirements.txt

You should now be able to run:

python route_compare.py --help

and get the usage banner. The repo ships with two small example spreadsheets under example/ so you can confirm everything works end-to-end:

python route_compare.py \
  --file-a example/A.xlsx \
  --file-b example/B.xlsx \
  --label-a OldEdge \
  --label-b NewEdge \
  --output example_diff.xlsx

Open example_diff.xlsx and you should see a colour-coded comparison sheet exercising every status type — exact matches, one-sided entries with no overlap, both directions of overlap, plus invalid rows for the audit sheet.

If you intend to use this often, dropping it onto your PATH is worthwhile:

chmod +x route_compare.py
mv route_compare.py ~/.local/bin/route-compare

That gives you a route-compare command available anywhere in your shell. Pair it with pipx if you would rather install into a managed virtual environment without thinking about it. For a one-shot script python route_compare.py from the project directory is fine. The venv is the only real recommendation; a system-Python pip install will eventually bite you.

Running it

Minimum viable invocation, with both files in the current directory:

python route_compare.py --file-a old-edge.xlsx --file-b new-edge.xlsx

That writes route_prefix_comparison.xlsx plus six CSVs to the current directory. The default labels in the report are simply A and B. To make the output more readable, name the sources:

python route_compare.py \
  --file-a old-edge.xlsx \
  --file-b new-edge.xlsx \
  --label-a OldEdge \
  --label-b NewEdge \
  --output edge-migration-diff.xlsx

Now the comparison sheet says “Only in OldEdge (overlaps NewEdge)” rather than “Only in A (overlaps B)”, and the per-source sheets are named OldEdge_normalised and NewEdge_normalised.

The right-hand-side spreadsheet is expected to have a column called prefix (case-insensitive). If yours is called something else:

python route_compare.py --file-a A.xlsx --file-b B.xlsx --prefix-col Network

For very large route tables — say, 50,000 prefixes on each side — the pairwise overlap sheet would be 50,000 × 50,000 = 2.5 billion comparisons. The tool detects this and writes a “skipped” note instead. Raise the threshold or turn the pairwise sheet off entirely:

python route_compare.py --file-a A.xlsx --file-b B.xlsx --no-overlap-pairs
python route_compare.py --file-a A.xlsx --file-b B.xlsx --overlap-pair-limit 5000000

The exact-match comparison is unaffected by either flag — it is always cheap. The overlap detection on the main “Comparison” sheet is also always present; what gets skipped is the explicit (A, B) pair listing.

For debugging, -v enables debug-level logging:

python route_compare.py --file-a A.xlsx --file-b B.xlsx -v

What you get out

The output workbook has six sheets:

Comparison — one row per canonical prefix, with In A, In B, Overlaps other side?, and Status. Green rows are exact matches. Red rows are one-sided with no overlap. Amber rows are one-sided but overlap something on the other side. Sort by status to triage. The header row is frozen so it stays visible while scrolling.

A_normalised and B_normalised — the full normalised view of each input, with the original cell value preserved alongside the canonical form. Two extra columns flag exact and overlapping matches against the other side. Useful when someone asks “where is 10.20.0.0/16 in this?” and you want to point at a row.

Invalid_A and Invalid_B — every cell that could not be parsed as an IP or prefix, with the original value and the parser’s error message. This is the audit trail. Anything sitting on these sheets is data the comparison did not see — go fix it at the source.

Overlaps_pairs — every (A, B) pair where the two networks overlap, with the relation. This is the sheet you screenshot for a meeting.

Each sheet is also dumped to CSV alongside the workbook, with the same data minus the colour coding. The CSVs are for piping into other tools or version-controlling. The Excel file is for humans.

A worked example

Suppose A contains 10.0.0.0/24, 10.0.1.0/24, 10.20.0.0/16, and 192.168.5.0/24. B contains 10.0.0.0/24, 10.0.1.5 (parsed as 10.0.1.5/32), 10.20.5.0/24, and 172.16.0.0/12. The comparison sheet looks roughly like this:

Prefix	In A	In B	Overlaps?	Status
`10.0.0.0/24`	true	true	false	Exact match in both (green)
`10.0.1.0/24`	true	false	true	Only in A (overlaps B) (amber)
`10.0.1.5/32`	false	true	true	Only in B (overlaps A) (amber)
`10.20.0.0/16`	true	false	true	Only in A (overlaps B) (amber)
`10.20.5.0/24`	false	true	true	Only in B (overlaps A) (amber)
`172.16.0.0/12`	false	true	false	Only in B (red)
`192.168.5.0/24`	true	false	false	Only in A (red)

The amber rows are the interesting ones. 10.0.1.0/24 (A) overlaps 10.0.1.5/32 (B) because the host route is inside the subnet. 10.20.0.0/16 (A) overlaps 10.20.5.0/24 (B) because the /16 contains the /24. Either could be a deliberate summarisation or a misconfiguration; the tool flags it but does not adjudicate. The red rows are the unambiguous “one side has it, the other does not, and nothing nearby covers it” cases.

The Overlaps_pairs sheet for the same data:

A	B	Relation
`10.0.0.0/24`	`10.0.0.0/24`	EQUAL
`10.0.1.0/24`	`10.0.1.5/32`	A_CONTAINS_B
`10.20.0.0/16`	`10.20.5.0/24`	A_CONTAINS_B

That is a document a network engineer can act on.

Things I would extend, given time

The tool is deliberately small, but a couple of natural extensions would be useful for larger jobs.

The first is performance. Exact-match comparison is set-membership and is essentially free. Overlap detection is O(|A| × |B|) in the obvious implementation, which is what the tool does today. For tens of thousands of prefixes, swapping the overlap step for a Patricia trie (pytricia on PyPI) or a netaddr.IPSet cuts the runtime dramatically and lets the pairwise sheet stay enabled at much higher scale.

The second is cross-format input. The current tool reads Excel, but the same comparison applies to FortiGate get router info routing-table all, Cisco show ip route, and BGP table dumps from a route reflector. Each of those has a known shape and would slot in behind a small adapter that pulls a list of prefixes out and feeds them to the same normaliser. The comparison engine does not care where the prefixes came from.

The third is provenance. Right now a prefix that appears multiple times in the source spreadsheet (with different next-hops, say, or different metadata in adjacent columns) collapses to a single row. Carrying the source row index through to the report would let the tool say “this prefix appears in A at row 47 and in B at rows 12 and 309” — useful when the spreadsheet is the system of record and you need to fix something at the source.

The code

The full script. Drop it into a directory next to two Excel files, install pandas and openpyxl, and run. The same source lives in the GitHub repo — clone there if you would rather not copy-paste.

#!/usr/bin/env python3
"""
Route/prefix comparison tool.

Compares prefixes between two Excel sources, normalises them using ipaddress,
detects exact matches and overlaps, preserves invalid entries for audit, and
writes a colour-coded Excel report plus CSV exports.

Usage (minimum):
    python route_compare.py --file-a A.xlsx --file-b B.xlsx

For full help:
    python route_compare.py --help
"""
from __future__ import annotations

import argparse
import ipaddress
import logging
import re
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from typing import Optional, Union

import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill
from openpyxl.utils.dataframe import dataframe_to_rows

# Public type alias — avoid the private ipaddress._BaseNetwork.
IPNetwork = Union[ipaddress.IPv4Network, ipaddress.IPv6Network]

log = logging.getLogger("route_compare")


# -------------------------
# Status (stable identifier + human display)
# -------------------------
class Status(str, Enum):
    EXACT_BOTH = "EXACT_BOTH"
    ONLY_A = "ONLY_A"
    ONLY_B = "ONLY_B"
    ONLY_A_OVERLAPS_B = "ONLY_A_OVERLAPS_B"
    ONLY_B_OVERLAPS_A = "ONLY_B_OVERLAPS_A"


def status_display(status: Status, label_a: str, label_b: str) -> str:
    """Render a Status as a human-friendly string substituted with source labels."""
    return {
        Status.EXACT_BOTH: "Exact match in both",
        Status.ONLY_A: f"Only in {label_a}",
        Status.ONLY_B: f"Only in {label_b}",
        Status.ONLY_A_OVERLAPS_B: f"Only in {label_a} (overlaps {label_b})",
        Status.ONLY_B_OVERLAPS_A: f"Only in {label_b} (overlaps {label_a})",
    }[status]


# -------------------------
# Normalisation
# -------------------------
# Regex extracts plausible IP/prefix tokens from messy cells.
# Validation/canonicalisation is handled by ipaddress (the authority).
CANDIDATE_RE = re.compile(r"([0-9A-Fa-f:.]+(?:\s*/\s*\d{1,3})?)")


@dataclass(frozen=True)
class NormaliseResult:
    original: str
    network: Optional[IPNetwork]
    error: Optional[str]

    @property
    def normalised(self) -> Optional[str]:
        return str(self.network) if self.network is not None else None


def _try_parse(token: str) -> Optional[IPNetwork]:
    """Parse a single token as an ip_network. Returns None on failure."""
    token = token.replace(" ", "")
    if not token:
        return None
    if "/" not in token:
        # Treat bare addresses as host routes.
        token = f"{token}/32" if "." in token else f"{token}/128"
    try:
        return ipaddress.ip_network(token, strict=False)
    except (ValueError, ipaddress.AddressValueError, ipaddress.NetmaskValueError):
        return None


def normalise_prefix(value) -> NormaliseResult:
    """Extract and canonicalise a prefix from a (possibly messy) cell value."""
    if pd.isna(value):
        return NormaliseResult("", None, "NaN/blank")
    s = str(value).strip()
    if not s:
        return NormaliseResult(s, None, "Blank")

    # Fast path: whole cell parses cleanly.
    net = _try_parse(s)
    if net is not None:
        return NormaliseResult(s, net, None)

    # Fallback: extract candidate tokens and try each.
    for c in CANDIDATE_RE.findall(s):
        net = _try_parse(c)
        if net is not None:
            return NormaliseResult(s, net, None)

    return NormaliseResult(s, None, "No valid IP/prefix found")


# -------------------------
# Helpers
# -------------------------
def find_prefix_column(df: pd.DataFrame, preferred_name: str = "prefix") -> str:
    """Return a column matching preferred_name (case-insensitive); else the first column."""
    preferred = preferred_name.strip().lower()
    for c in df.columns:
        if str(c).strip().lower() == preferred:
            return c
    return df.columns[0]


def to_network_index(networks: pd.Series) -> dict[str, IPNetwork]:
    """Deduplicate networks. Returns {canonical_str: ip_network}."""
    out: dict[str, IPNetwork] = {}
    for n in networks.dropna():
        out[str(n)] = n
    return out


def overlaps_any(net: IPNetwork, others: list[IPNetwork]) -> bool:
    """True if net overlaps any network in others. O(len(others))."""
    return any(net.overlaps(o) for o in others)


def overlap_relation(a: IPNetwork, b: IPNetwork) -> str:
    if a == b:
        return "EQUAL"
    if a.supernet_of(b):
        return "A_CONTAINS_B"
    if a.subnet_of(b):
        return "A_WITHIN_B"
    return "PARTIAL_OVERLAP"


def read_excel_safe(path: Path, **kwargs) -> pd.DataFrame:
    """pd.read_excel with friendly errors instead of stack traces."""
    try:
        return pd.read_excel(path, engine="openpyxl", **kwargs)
    except FileNotFoundError as e:
        raise SystemExit(f"ERROR: file not found: {path}") from e
    except Exception as e:  # malformed file, locked file, etc.
        raise SystemExit(f"ERROR: could not read {path}: {e}") from e


# -------------------------
# Excel writing
# -------------------------
def autosize_columns(ws, max_width: int = 60) -> None:
    for col in ws.columns:
        max_len = 0
        col_letter = col[0].column_letter
        for cell in col:
            val = "" if cell.value is None else str(cell.value)
            max_len = max(max_len, len(val))
        ws.column_dimensions[col_letter].width = min(max_len + 2, max_width)


def write_df(ws, df: pd.DataFrame, header_fill, header_font) -> None:
    for r in dataframe_to_rows(df, index=False, header=True):
        ws.append(r)
    if df.empty:
        return
    for cell in ws[1]:
        cell.fill = header_fill
        cell.font = header_font
    ws.freeze_panes = "A2"


def fill_rows_by_status(ws, status_col_index_1based: int,
                        fills_by_status: dict[str, PatternFill]) -> None:
    for row in ws.iter_rows(min_row=2):
        status_val = row[status_col_index_1based - 1].value
        fill = fills_by_status.get(str(status_val))
        if fill is None:
            continue
        for cell in row:
            cell.fill = fill


def safe_sheet_name(name: str) -> str:
    """Excel sheet names are <= 31 chars and have a small set of forbidden chars."""
    cleaned = re.sub(r"[\\/*?:\[\]]", "_", name)
    return cleaned[:31]


# -------------------------
# Main pipeline
# -------------------------
def build_outputs(
    file_a: Path,
    file_b: Path,
    output_xlsx: Path,
    prefix_col_name: str = "prefix",
    label_a: str = "A",
    label_b: str = "B",
    write_overlap_pairs: bool = True,
    overlap_pair_limit: int = 2_000_000,
) -> None:
    log.info("Reading %s", file_a)
    A = read_excel_safe(file_a, header=None)
    log.info("Reading %s", file_b)
    B = read_excel_safe(file_b)

    a_raw = A.iloc[:, 0]
    b_prefix_col = find_prefix_column(B, prefix_col_name)
    log.info("Using column %r from %s", b_prefix_col, file_b.name)
    b_raw = B[b_prefix_col]

    a_results = a_raw.apply(normalise_prefix)
    b_results = b_raw.apply(normalise_prefix)

    A_net = a_results.apply(lambda r: r.network)
    A_norm = a_results.apply(lambda r: r.normalised)
    A_err = a_results.apply(lambda r: r.error)
    B_net = b_results.apply(lambda r: r.network)
    B_norm = b_results.apply(lambda r: r.normalised)
    B_err = b_results.apply(lambda r: r.error)

    # canonical-string -> ip_network. Each network parsed exactly once.
    A_index = to_network_index(A_net)
    B_index = to_network_index(B_net)
    A_set, A_nets = set(A_index.keys()), list(A_index.values())
    B_set, B_nets = set(B_index.keys()), list(B_index.values())

    # Combined index for the comparison sheet.
    all_index: dict[str, IPNetwork] = {**A_index, **B_index}
    all_strs = sorted(
        all_index.keys(),
        key=lambda s: (
            all_index[s].version,
            all_index[s].network_address,
            all_index[s].prefixlen,
        ),
    )

    rows = []
    for p in all_strs:
        net = all_index[p]
        in_a = p in A_set
        in_b = p in B_set
        if in_a and in_b:
            status = Status.EXACT_BOTH
        elif in_a:
            status = Status.ONLY_A_OVERLAPS_B if overlaps_any(net, B_nets) else Status.ONLY_A
        else:
            status = Status.ONLY_B_OVERLAPS_A if overlaps_any(net, A_nets) else Status.ONLY_B
        rows.append({
            "Prefix": p,
            f"In {label_a}": in_a,
            f"In {label_b}": in_b,
            "Overlaps other side?": status in (Status.ONLY_A_OVERLAPS_B, Status.ONLY_B_OVERLAPS_A),
            "Status": status_display(status, label_a, label_b),
        })
    comparison_df = pd.DataFrame(rows)

    # Invalid entries (audit trail).
    invalid_a_df = pd.DataFrame({"Original": a_raw.astype(str), "Error": A_err}) \
        .loc[A_norm.isna()].reset_index(drop=True)
    invalid_b_df = pd.DataFrame({"Original": b_raw.astype(str), "Error": B_err}) \
        .loc[B_norm.isna()].reset_index(drop=True)

    # Per-source normalised views.
    a_norm_df = pd.DataFrame({"Original": a_raw.astype(str), "Normalised": A_norm}) \
        .dropna(subset=["Normalised"]).drop_duplicates(subset=["Normalised"]).reset_index(drop=True)
    a_norm_df[f"Exact in {label_b}?"] = a_norm_df["Normalised"].apply(
        lambda x: "Yes" if x in B_set else "No")
    a_norm_df[f"Overlaps {label_b}?"] = a_norm_df["Normalised"].apply(
        lambda x: "Yes" if overlaps_any(A_index[x], B_nets) else "No")

    b_norm_df = pd.DataFrame({"Original": b_raw.astype(str), "Normalised": B_norm}) \
        .dropna(subset=["Normalised"]).drop_duplicates(subset=["Normalised"]).reset_index(drop=True)
    b_norm_df[f"Exact in {label_a}?"] = b_norm_df["Normalised"].apply(
        lambda x: "Yes" if x in A_set else "No")
    b_norm_df[f"Overlaps {label_a}?"] = b_norm_df["Normalised"].apply(
        lambda x: "Yes" if overlaps_any(B_index[x], A_nets) else "No")

    # Pairwise overlaps (optional, auto-skip if too large).
    overlaps_pairs_df: Optional[pd.DataFrame] = None
    if write_overlap_pairs:
        est = len(A_nets) * len(B_nets)
        if est <= overlap_pair_limit:
            pairs = []
            for a in A_nets:
                for b in B_nets:
                    if a.overlaps(b):
                        pairs.append({
                            f"{label_a}": str(a),
                            f"{label_b}": str(b),
                            "Relation": overlap_relation(a, b),
                        })
            overlaps_pairs_df = pd.DataFrame(pairs)
            if not overlaps_pairs_df.empty:
                overlaps_pairs_df = overlaps_pairs_df.sort_values(
                    by=[f"{label_a}", f"{label_b}"]
                ).reset_index(drop=True)
        else:
            log.warning(
                "Skipping pairwise overlaps: %d x %d = %d > limit %d",
                len(A_nets), len(B_nets), est, overlap_pair_limit,
            )
            overlaps_pairs_df = pd.DataFrame([{
                "Note": (
                    f"Skipped pairwise overlaps: {len(A_nets)} x {len(B_nets)} "
                    f"= {est} comparisons exceeds limit {overlap_pair_limit}."
                ),
            }])

    # -------------------------
    # Write workbook + CSVs
    # -------------------------
    wb = Workbook()
    wb.remove(wb.active)

    header_fill = PatternFill("solid", fgColor="1F4E79")
    header_font = Font(color="FFFFFF", bold=True)
    green_fill = PatternFill("solid", fgColor="C6EFCE")
    red_fill = PatternFill("solid", fgColor="FFC7CE")
    amber_fill = PatternFill("solid", fgColor="FFF2CC")

    fills_by_status = {
        status_display(Status.EXACT_BOTH, label_a, label_b): green_fill,
        status_display(Status.ONLY_A, label_a, label_b): red_fill,
        status_display(Status.ONLY_B, label_a, label_b): red_fill,
        status_display(Status.ONLY_A_OVERLAPS_B, label_a, label_b): amber_fill,
        status_display(Status.ONLY_B_OVERLAPS_A, label_a, label_b): amber_fill,
    }

    ws_cmp = wb.create_sheet("Comparison")
    write_df(ws_cmp, comparison_df, header_fill, header_font)
    fill_rows_by_status(ws_cmp, status_col_index_1based=5, fills_by_status=fills_by_status)
    autosize_columns(ws_cmp)

    sheets = [
        (safe_sheet_name(f"{label_a}_normalised"), a_norm_df),
        (safe_sheet_name(f"{label_b}_normalised"), b_norm_df),
        (safe_sheet_name(f"Invalid_{label_a}"), invalid_a_df),
        (safe_sheet_name(f"Invalid_{label_b}"), invalid_b_df),
    ]
    for name, df in sheets:
        ws = wb.create_sheet(name)
        write_df(ws, df, header_fill, header_font)
        autosize_columns(ws)

    if overlaps_pairs_df is not None:
        ws_ov = wb.create_sheet("Overlaps_pairs")
        write_df(ws_ov, overlaps_pairs_df, header_fill, header_font)
        autosize_columns(ws_ov)

    wb.save(output_xlsx)

    out_stem = output_xlsx.with_suffix("")
    la, lb = label_a.lower(), label_b.lower()
    comparison_df.to_csv(f"{out_stem}_comparison.csv", index=False)
    a_norm_df.to_csv(f"{out_stem}_{la}_normalised.csv", index=False)
    b_norm_df.to_csv(f"{out_stem}_{lb}_normalised.csv", index=False)
    invalid_a_df.to_csv(f"{out_stem}_invalid_{la}.csv", index=False)
    invalid_b_df.to_csv(f"{out_stem}_invalid_{lb}.csv", index=False)
    if overlaps_pairs_df is not None:
        overlaps_pairs_df.to_csv(f"{out_stem}_overlaps_pairs.csv", index=False)

    log.info("Written %s", output_xlsx)
    log.info("%s valid: %d | %s invalid: %d", label_a, len(A_set), label_a, len(invalid_a_df))
    log.info("%s valid: %d | %s invalid: %d", label_b, len(B_set), label_b, len(invalid_b_df))
    log.info("Comparison rows: %d", len(comparison_df))


# -------------------------
# CLI
# -------------------------
def parse_args(argv: Optional[list[str]] = None) -> argparse.Namespace:
    p = argparse.ArgumentParser(
        description="Compare route/prefix lists between two Excel sources."
    )
    p.add_argument("--file-a", required=True,
                   help="Excel source A (single column of prefixes; no header).")
    p.add_argument("--file-b", required=True,
                   help="Excel source B (with a 'prefix' column or first column).")
    p.add_argument("--output", default="route_prefix_comparison.xlsx",
                   help="Output Excel filename. CSV files share the stem.")
    p.add_argument("--label-a", default="A", help="Friendly name for source A.")
    p.add_argument("--label-b", default="B", help="Friendly name for source B.")
    p.add_argument("--prefix-col", default="prefix",
                   help="Preferred column name in file B (case-insensitive).")
    p.add_argument("--no-overlap-pairs", action="store_true",
                   help="Skip the pairwise overlap sheet entirely.")
    p.add_argument("--overlap-pair-limit", type=int, default=2_000_000,
                   help="Max len(A) * len(B) before pairwise overlaps are skipped.")
    p.add_argument("-v", "--verbose", action="store_true", help="Verbose logging.")
    return p.parse_args(argv)


def main(argv: Optional[list[str]] = None) -> int:
    args = parse_args(argv)
    logging.basicConfig(
        level=logging.DEBUG if args.verbose else logging.INFO,
        format="%(asctime)s %(levelname)s %(message)s",
    )

    file_a = Path(args.file_a)
    file_b = Path(args.file_b)
    output = Path(args.output)

    for f in (file_a, file_b):
        if not f.exists():
            log.error("File not found: %s", f)
            return 2

    build_outputs(
        file_a=file_a,
        file_b=file_b,
        output_xlsx=output,
        prefix_col_name=args.prefix_col,
        label_a=args.label_a,
        label_b=args.label_b,
        write_overlap_pairs=not args.no_overlap_pairs,
        overlap_pair_limit=args.overlap_pair_limit,
    )
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Closing thought

There is a recurring pattern in network operations where the right tool is small, scriptable, and has good output. Not a platform, not a service, just a script that takes two messy spreadsheets and gives you back the answer to “do these agree?” in a form a human can read in thirty seconds. The expensive part is not the comparison logic — ipaddress does that for you. It is the audit trail, the colour coding, the invalid-entry sheet, the pairwise overlap report. The parts that turn “here is the result” into “here is the result, and here is how I got there, and here is what you should look at first”. Those are what make a tool worth keeping.