Digital Spies, Infrastructure (In)security, and the Culture of Computing
A Discussion of the XZ Attack
Greetings, Travelers!
Welcome to real summer. These summertime shots are just things I happened to see over the last week, no real theme. I did take an image, for the 100th time failed to hit where I was aiming, that warrants an essay. I decided, however, to postpone the image and discussion as a distraction from today’s issue, digital spying.
One of the things I think and write about is “security,” in several senses of the word. And “spying” is an occasional theme of Intermittent Signal, usually as something more than metaphor for our times, and its crazy discourses.
Dispatches from the Cold: University Today, Everyman the Spy and Soft Colors
And one might write about computing and the duality of language as central to the very idea of computing and of spying, at least humint. Turing, codes, both secret and social, and so forth. But as intellectuals need to be reminded that sometimes a cigar is just a cigar, sometimes a spy is just that, a spy. This is a story about one of those times.
On March 29, 2024, Andreas Freund, a Microsoft engineer, announced that Linux, the free operating system that runs untold personal, commercial, and governmental computers worldwide, had been attack and was at imminent risk of serious compromise. Freund Announcement of XZ Attack.
In broad outline, the attack worked like this. Linux is updated regularly via distributions of software to users. A forthcoming distribution contained software that inserted a “backdoor,” which is a form of malware that negates a system’s normal authentication processes. Someone using the backdoor would have total access to the computers programing and data, that is, complete control. Given the critical missions run on Linux machines, this is a very scary possibility. The malware was caught, and the distribution was not made, and calamity was averted. This appears to have been the computer security equivalent of 1983’s Able Archer nuclear scare, in which the Soviets became convinced that a NATO exercise was in fact mobilization for a first strike.
My friend Perry Alexander is a computer scientist of considerable renown, who uses “formal methods” for “high assurance” systems, often in military contexts. Over the years, I have spent time with Perry and some of his colleagues, mostly talking about the meanings of what they do. I find such discursive “colleges” fascinating. In particular, there are important similarities, and of course differences, among the understandings of “security” in cybersecurity, counterterrorist, and military discourses, to say nothing of diplomacy or finance. Mark Maguire and I tried to generalize some of this thinking in Getting Through Security: Counterterrorism, Bureaucracy, and a Sense of the Modern.
The xz attack is Perry’s nightmare. We started talking, in person and via emails. I suggested he try to explain the situation to a non-specialist audience, which I would be happy to publicize. He agreed. What follows is a barely edited synthesis of Perry’s communications. After some thought, I’ve decided to leave it pretty much untouched because the diction provides a real window onto a world. “You” is me, David (back, pedants!) and whoever is looking over my shoulder. A few notes are in brackets. It’s a fascinating, and terrifying, story. Enjoy.
On the weekend of 29 March 2024 an engineer from Microsoft announced that the software community had a problem. An adversary, almost certainly nation state, had run a brilliant multiphase attack. The attack included elements of social engineering, supply chain, and technical hacking run over a number of years. Fake identities were established and a circle of trust infiltrated. Once trusted, the fake identities compromised a key path for delivering software to Linux distributions. The software they delivered used an innocuous utility for compressing data to insert a backdoor that allowed executing anything at any privilege level. The keys to the kingdom.
I'm going to tell you the story of this hack and how the culture I work in every day prevented it from going live. If you are a hard core computer scientist, this story is not for you. It's for all my friends who want some insight into cybersecurity and what the bad guys are up to. I've elided some details to make things simpler. Please forgive any omissions.
Before diving into the attack, it's important to understand what open source software is and how it works.
Open Source
If we go back in time, all software was free. You bought your hardware and the software you ran came with it, usually as source, or you wrote it yourself. Software was thought to have no value, constantly changing, and easy. Turns out we were wrong on all counts, but that's another story. The community around the hardware you bought shared and maintained software with the vendor. But you bought hardware, not software.
Sometime in the ‘70s this changed, and companies started charging for compiled software systems. Companies that wrote software exclusively began to emerge and become profitable. Their software was licensed and closed for most major commercial systems. Closed in the sense that you could not see source or modify source to fix bugs or add features. Licensed in the sense that if you shared it you got in trouble. If the vendor goes out-of-business or drops support for your product, you’re screwed. It's a business model and a perfectly reasonable one if you think buying a tractor whose engine can only be accessed by its manufacturer is a good idea.
Then several things happened - Unix, FSF, and Linux. Unix was developed by AT&T and you could get a copy by sending them a mag tape and $50. Unix was hardware independent and came with all the tools needed to build and install it. One could read the source. Still, it was owned and licensed by AT&T and we were governed by that license.
The Free Software Foundation (FSF) and the GNU (“GNU Not Unix”) project to a large extent reintroduced the idea that software should be free and treated as community property. FSF was formed by Richard Stallman who was a student of Jay Sussman at MIT. Both continue to provide leadership for FSF to this day. (You may remember Jay from NDIST.)
[GNU Not Unix is a recursive acronym, which is just really clever for programming, which relies on recursion . . . ]
As community property, this software is maintained by what you called collections of "volunteers." I have never thought of us as volunteers. We use software and as we improve upon it, we make our improvements freely available to the community. We are allowed to say how the software we make can be used, but it is open and free.
What GNU is for compilers, editors and debuggers, Linux is for operating systems. It is a cleanroom implementation of Unix originally developed by Linus Torvalds. Linux is free and open. I used to jokingly refer to Linux as software developed by hippies for free. Usually in reference to Linux being better than certain commercial operating systems developed by certain unnamed mega corporations. Maybe I shouldn't tell you this, but MacOS is built on Darwin which is in turn built upon FreeBSD and Mach. Both FreeBSD and Mach are open source projects maintained by their communities. Darwin is now a full-fledged open-source UNIX maintained by Apple.
Linux and FSF's GNU project revolutionized software development forever by making software free again. I can confidently say that every computing system you interact with is derived in part from open source software. No kidding. Every system. Open source is everywhere and is critical. We could not build or maintain the software infrastructure we have without sharing.
A colleague who has 30+ years experience developing secure software told me explicitly that had the xv attack target been a closed source system we would have never found it. Why? Because the developer who found the bug could look at the source when the system behaved unexpectedly. Closed source you live with what you get and the developer could not see either the development process or the source code. The open source adage is 10,000 pairs of eyes look at source. (This turns out to be false, but that's a different thread.)
Okay, but the developers behind all those 10,000 eyes can change the source anonymously. This, you say, is absurd. It would be if that were what actually happens. To understand what happens you need to understand git repos. Public domain software is stored in repositories (“repos”) that are maintained by adding patches that represent code changes. If I want to make a change to a package I have git [a system that tracks changes in software, also developed by Torvalds] create a patch and submit that patch to the repo as a “pull request.” [You are requesting the maintainer to “pull” the suggested code from your version of the software (“fork”) and add it to the repository. Like accepting a suggested edit.] Each pull request is evaluated by a system maintainer or owner, who determines if the change should be added. Nothing is added to the repo without review by at least one maintainer. Even maintainers themselves must have changes they develop reviewed.
The repo itself becomes a history of additions to the repo. When the repo is updated, the patch is added to the existing contents. So a repo is not just source code, it is a collection of patches applied in sequence to the initial commit. The repo becomes a ledger that tracks all changes. Plus, each change is signed by the submitter. Given a repo you know what changes were made and who made them. Just like a bank ledger. Any change can be undone, parallel branches can be created and merged, and the entire process is auditable.
Trust
Now that we know how repos work and how they are maintained, we need to deal with trust. How do you trust maintainers and owners? How do you identify submitters of patches?
What is trust? Strong identity, observed good behavior, indirect good behavior observed by a trusted third party. Strong IDs are identifiers bound uniquely to an entity. On most sites, IDs are created and associated with passwords. Logging in using your password verifies your right to use the associated ID. It does not reveal who you are, just that you own the ID. Maintaining a secret password is a big deal. If someone has your password, they can be you. This is why multi-factor authentication has become a big deal. Also, why sharing a password in my shop will get you walked immediately. No discussion.
Anyone can create one or many ids and passwords and participate in the process of maintaining software. There are two kinds of people we might need to trust - patch submitters and system maintainers. Let’s discharge the issue of submitters first. We don’t care who they are. They submit a patch and the maintainers review the patch. Maintainers do not need to trust any submitter because they always verify the patch is good before adding it. Submitters have no privileges for writing to the repo, so all they can do is ask. If a maintainer can’t establish a patch is good, they never add it. If a submitter tries to do something nefarious, the maintainer can block or ignore them in the future. Regardless, there is no need to trust because the maintainer is evaluating the submission and the submitter can only ask.
What about maintainers? If they are adding to the repository, they must be trusted and authenticated. More specifically, their identities need to be trusted. There are two common ways of establishing trust in an identity. The first is building a chain of trust from a root id that is trusted. Whenever you use a notary public, you are using this mechanism. The notary establishes their trust with the government using physical identification. They are issued a certificate that memorializes that trust and we trust the notary because we trust the certifier. The notary extends that trust to you by observing your signature in the presence of your ID. They add a stamp that certifies your signature can be trusted because they are trusted by the government. Now extend that. Let's say the notary certifies that you are trustworthy and gives you a certificate that allows you to be a notary. Do this over and over and you end up with a tree of trust from the government root.
There are good things about this tree model. The biggest of which is one can trace trust in an identifier back to a root they trust. Maybe the government, maybe your company, or University. You can always trace your trust in an identifier back to this authority you have decided to trust.
There are also bad things about the tree. If a bad actor is certified or a bad actor compromises the certification process, everything under their node in the trust tree is now untrusted. Furthermore, the certifying individual completely controls who they choose to certify. If the government decides you are a subversive, they don't certify your identity. We see this with passports in nefarious countries all the time. Try functioning without WeChat in China.
The set of maintainers uses a less formal web of trust. You use this model all the time, but likely don't realize it. In a web of trust there are no roots. You decide who you directly trust. You also record that trust and announce it to everyone else. Because trust is transitive, you trust also IDs trusted by those you trust. That announcement adds to the IDs reputation. The more IDs that trust you, the greater your reputation. In a formally implemented web of trust you indicate trust in an ID by signing it with your private key. That signature can be checked to make sure you really do trust it. (In the old days we used to have key signing parties where everyone on a project would get together, verify identities visually, and sign keys.)
When an ID is compromised in a web, the reputation gained from that ID is deleted. If lots of other IDs trust you, the deletion of one link does very little. If not many IDs trust you, your trust could be heavily compromised or even be eliminated. Regardless, there is no central authority establishing trust. It is the community.
The web of trust among maintainers is not constructed formally by signing keys. It's far more informal. Just like joining a club or getting a security clearance or granting tenure, those that want to trust you check that you have lots of links from diverse places. Then they indicate their trust to the larger community.
Next up, how the web of trust was attacked.
The Attack
The social part of the xz attack was successfully launched against the web of xz maintainers and was classic espionage tradecraft. Nothing particularly novel, other than the target.
The perpetrator of the attack established one or more ids on GitHub. They proceeded to make legitimate pull requests to maintainers over a period of years. They were establishing a basic presence in the community. At some point they targeted xz. Likely because it had a small maintainer group - a single person who was dealing with medical issues. They then flooded the xz maintainer with pull requests from many ids.
Now the xz maintainer can’t keep up. The attacker who has been legitimately contributing patches offers to help. The attackers then start badgering the maintainer and the maintainer’s colleagues. The mole continues to offer help. The maintainer’s reputation is at stake. [For some of the dialog, see The XZ Attack and Timeline.]
Finally, the attackers win. The barrage has hurt the reputation of the maintainer who wants to retain his status. So, they give in and grant access to the attacker. Now the attacker is where they want to be.
This is classic espionage. The attacker threatened something valuable to the maintainer. The attacker appeared trustworthy based on earlier behavior. Target moves to protect the valuable thing. Boom. In human systems the valuable thing could be anything - family, wealth, status. Here the valuable thing was reputation in an online community.
Do not underestimate the difficulty of pulling off this attack. It took many years. It took knowledge of the Linux maintenance mechanism. It took an understanding of human behavior and spy tradecraft. Likely several people involved. Likely a nation state.
Next up, the technical side of the attack. Be prepared for geeky stuff.
I have comments about virtually every sentence of the foregoing, but I’ve been doing this for a long time. For today, I’ll restrict myself to the following.
It is difficult to believe that this system can continue in its present configuration. If we do not care who the submitters are, then the maintainers are a single line or plane of defense. If there is only one maintainer, then we have a single point of failure. Which, here, failed. It’s a bit professionally, shall we say convenient, to frame this as a success by Freund and the open source community. That little Dutch boy saved the dike, too.
Second, the relationships among transparency, secrecy and certification as sources of trust, even within this text, are endlessly fascinating. In conversation with Perry, I spoke about about “volunteers,” i.e., there is no licensing process for submitters. We do not know who the submitter was, or, evidently more likely, who was on the team represented by “Jia Tan” and other ids. As a law professor, in the business of licensing, to say nothing of a car driver, I find this bizarre. Perry’s account adds nuance, and tells us a great deal about how we got to this point, but it hardly convinces me that all is well with that world, that is, the control systems that sustain contemporary life.
Third and most importantly, as with oral history generally, the diction is significant, meaningfully paints a way of looking at the world. Roughly understanding, analyzing, and gently critiquing such worlds is much of what I do. More generally, in a society constructed, and here barely protected, by experts, with their own blind spots and limitations, amateur conversation, with members of the public, is critical. Contra my friend, public accounts of what they do are exactly what powerful experts (“hard core computer scientist”), in the military and elsewhere, should be doing. And I thank Perry for doing so. Not so incidentally, this is a central theme in my next book, Quixote’s Dinner Party.
I hope that Perry finds time to deliver the second part of his story, describing the programming that went into this attack and other technical matters. I’ll try to follow along and translate as best I can.
Many more flowers, gardens, and the beginnings of a defense of suburbia at Flowers, Thoughts. Some of these images were republished by Honoring the Future, an NGO that uses art to build engagement with climate change issues.
Be careful out there, stay cool, and enjoy the heat.
— David A. Westbrook