The XZ Utils Backdoor: How a Supply Chain Attack Hid Outside Git

Indice dei contenuti

CVE-2024-3094 was not just a backdoor. It was a two-year infiltration campaign that exploited a structural blind spot in how open source software is actually distributed. The malicious code was never committed to git at all.

In March 2024, Andres Freund, a Microsoft engineer, noticed that SSH logins on his Debian testing machine were taking about 500ms longer than expected and consuming anomalous CPU. He spent several hours tracing the cause. What he found was one of the most sophisticated supply chain attacks ever discovered in open source software.

XZ Utils is a compression library used everywhere in Linux. The liblzma shared library it ships is a transitive dependency of libsystemd, which on Debian and Fedora gets linked into sshd via a third-party patch. Whoever could compromise liblzma could inject code directly into the SSH daemon memory space on hundreds of millions of machines.

Jia Tan, a contributor who had been active in the XZ Utils project for over two years, had done exactly that. But the way she did it is what makes the attack worth studying in depth.

Two Years of Patience
#

The attack did not start with code. It started with a GitHub account.

In November 2021, the JiaT75 account appeared on GitHub and began making small, genuine contributions to open source projects. Nothing suspicious: bug fixes, test improvements, reasonable patches. Through 2022, the account built a quiet reputation as a reliable contributor.

In April 2022, the first contributions to XZ Utils arrived. They were polite, technically correct, and genuinely useful. Lasse Collin, the sole long-time maintainer, was managing the project largely alone. In June 2022, a wave of new accounts appeared on the XZ Utils mailing list expressing frustration that patches from Jia Tan were sitting unreviewed. The pressure was coordinated. Within weeks, Jia Tan was promoted to co-maintainer.

By March 2023, she had moved her own email address into the OSS-Fuzz configuration for the project, redirecting automated security findings from the public tracker to herself. The house was hers. She just had not yet built the trap.

The Dependency Chain Into sshd
#

Before getting to the delivery mechanism, it helps to understand exactly why XZ Utils had any relationship with SSH at all.

OpenSSH does not depend on XZ Utils. But on Debian and Fedora, a third-party patch links sshd against libsystemd for systemd socket-activation notification. libsystemd depends on liblzma for journal compression. The dependency is three hops deep and entirely invisible at the sshd source level.

When sshd starts, the dynamic linker loads liblzma into the same process memory. If liblzma registers an IFUNC resolver during library initialization, that code runs before main. The attacker found a way to hook RSA_public_decrypt in OpenSSL from inside XZ Utils. The hook fires before any authentication takes place.

The Gap Between Git and Tarball
#

This is the most technically elegant part of the attack, and it exploited a structural blind spot in how most open source projects are audited.

When a maintainer releases a new version of an open source library, the standard workflow is:

flowchart LR
    A["git tag v5.6.0"] --> B["make dist"]
    B --> C["Autotools generates configure + M4 macros"]
    C --> D["release tarball .tar.gz"]
    D --> E["upload to GitHub Releases"]
    E --> F["distro packager downloads tarball"]
    F --> G["./configure && make && make install"]

The critical point: distribution packagers almost never build from git. They download the release tarball, because tarballs include generated files (configure scripts, M4 macros, Makefile templates) that are not committed to git and would require the packager to have the full Autotools toolchain to regenerate. It is the established, expected workflow.

This means the git repository and the release tarball are two different artifacts. In most projects, nobody formally audits the diff between them. They are assumed to be equivalent.

They were not.

flowchart TD
    subgraph GIT ["Git Repository - publicly audited"]
        G1["source files"]
        G2["test fixtures: bad-3-corrupt_lzma2.xz, good-large_compressed.lzma"]
        G3["build scripts"]
    end

    subgraph TAR ["Release Tarball - review surface gap"]
        T1["source files"]
        T2["test fixtures - same binary blobs"]
        T3["build scripts"]
        T4["build-to-host.m4 - TAMPERED, not in git"]
    end

    GIT -->|"make dist adds generated files"| TAR

    style T4 fill:#7b1f1f,stroke:#f09595,color:#f7c1c1
    style TAR fill:#1a0a0a,stroke:#f09595

The tampered build-to-host.m4 file existed only in the release tarball. It was absent from git history entirely. A reviewer inspecting every commit, every diff, and every PR in the XZ Utils repository would find nothing.

What the Tampered M4 File Actually Did
#

build-to-host.m4 is invoked during ./configure, the first step of the standard ./configure && make && make install build chain.

Technical attack chain from tampered M4 to remote code execution

The modified version contained an obfuscated shell payload. When run during a Debian or Fedora package build (specifically: on x86-64, with glibc, inside a dpkg or rpm build environment), it:

Located two binary files already present in the repository as ostensible test fixtures: tests/files/bad-3-corrupt_lzma2.xz and tests/files/good-large_compressed.lzma
Extracted and decoded their contents using a chain of head, tail, tr, and xz commands embedded in the M4 macro
Injected the resulting object code directly into the compiled liblzma.so output

The test files themselves looked innocuous. Binary test data for a compression library is completely normal. bad-3-corrupt_lzma2.xz and good-large_compressed.lzma are exactly the kinds of files you would expect to find in an XZ Utils test suite. The M4 file was the activation key.

The injected code used glibc’s IFUNC mechanism (indirect function resolver) to hook RSA_public_decrypt in OpenSSL at library load time. The hook intercepted every RSA decryption call sshd made, checked the payload for an Ed448 signature matching a private key only the attacker held, and if valid, executed the payload before returning to the normal authentication flow.

The result: pre-authentication remote code execution. Root access before any login prompt.

Why This Defeated Most Review
#

Security review of open source contributions almost universally happens at the git diff level. GitHub’s pull request interface, patch review on mailing lists, and automated static analysis all operate on git history. A reviewer approving changes to XZ Utils would inspect the git diff, find nothing unusual, and move on.

The tampered M4 was outside that review surface entirely.

flowchart LR
    subgraph VISIBLE ["What reviewers see"]
        V1["git diff"]
        V2["PR interface"]
        V3["static analysis on source code"]
        V4["mailing list patches"]
    end

    subgraph BLIND ["What nobody checks"]
        B1["diff tarball vs git-generated output"]
        B2["integrity check: expected vs actual generated files"]
    end

    VISIBLE -->|"covers"| C1["source changes"]
    BLIND -->|"would catch"| C2["build-to-host.m4 injection"]

    style BLIND fill:#3d1f00,stroke:#ef9f27,color:#fac875
    style C2 fill:#7b1f1f,stroke:#f09595,color:#f7c1c1

The only ways to catch it would have been to:

Explicitly diff the release tarball against what make dist produces from a clean git checkout
Run automated integrity checks comparing tarball contents to expected generated outputs

Almost no project does either of these systematically. The gap is structural: Autotools-based projects generate files during make dist that do not exist in git, so a tarball will always contain files not tracked by git. Auditing those generated files requires tooling and process that simply has not been standard practice.

What Changed After Discovery
#

Freund published his findings on March 29, 2024. Within hours, Debian, Fedora, and other distributions had rolled back the affected versions. The vulnerable releases (5.6.0 and 5.6.1) had not reached stable distributions, so the attack was caught before it could be activated at scale.

GitHub suspended the JiaT75 account and the XZ Utils repository. The project was handed back to Lasse Collin for recovery.

The incident triggered a broader conversation about supply chain security practices that had been deferred for years. Reproducing the diff between a release tarball and a clean make dist run is now a recommended audit step. Several Linux distributions moved toward building packages directly from git with reproducible-build toolchains rather than from upstream tarballs.

None of this was novel advice. Researchers had been writing about tarball-vs-git gaps for years. It took a near-miss of this magnitude to make it standard practice.

The Shape of the Attack
#

What makes XZ stand apart from most supply chain attacks is the investment in legitimacy. Jia Tan did not compromise a dependency through a typosquat or a stolen credential. She earned commit access through two years of genuine, high-quality work. The social engineering used to pressure Lasse Collin was patient and plausible. The OSS-Fuzz redirection was a quiet, operational step that would have suppressed automated discovery of her own backdoor.

The technical delivery was designed around the assumption that reviewers check git. The M4 file was never in git. The binary test fixtures were plausible artifacts. The activation conditions (x86-64, glibc, Debian/Fedora build environment) were narrow enough to avoid triggering on developer machines.

Every layer of the attack was aimed at a specific control that the open source ecosystem had in place. Each layer found the gap.

The question the XZ incident leaves open is how many similar operations are still running, with less noise, on projects nobody happens to be benchmarking SSH on a Tuesday afternoon.

Two Years of Patience#

The Dependency Chain Into sshd#

The Gap Between Git and Tarball#

What the Tampered M4 File Actually Did#

Why This Defeated Most Review#

What Changed After Discovery#

The Shape of the Attack#

Articoli correlati