The Uses and Abuses of Cryptography
Another day, another data breach, and another round of calls for companies to encrypt their databases. Cryptography is a powerful tool, but in cases like this one it’s not going to help. If your OS is secure, you don’t need the crypto; if it’s not, the crypto won’t protect your data.
In a case like the Anthem breach, the really sensitive databases are always in use. This means that they’re effectively decrypted: the database management systems (DBMS) are operating on cleartext, which means that the decryption key is present in RAM somewhere. It may be in the OS, it may be in the DBMS, or it may even be in the application itself (though that’s less likely if a large relational database is in use, which it probably is). What’s to stop an attacker from obtaining that key, or perhaps from just making database queries?
The answer, in theory, is other forms of access control. Perhaps the DBMS requires authentication, or operating system permissions will prevent the attacker from getting at the keys. Unfortunately—and as these many databreaches show—these defenses are not configured properly or aren’t doing the job. If that’s the case, though, adding encryption isn’t going to help; the attacker will just go around the crypto. There’s a very simple rule of thumb here: Encryption is most useful when OS protections cannot work.
What do I mean by that? The most obvious situation is where the attacker has physical access to the device. Laptop disks should always be encrypted; ditto flash drives, backup media, etc. Using full disk encryption on your servers’ drives isn’t a bad idea, since it protects your data when you discard the media, but you then have to worry about where the key comes from if the server crashes and reboots.
Cloud storage is a good place for encryption, since you don’t control the machine room and you don’t control the hypervisor. Again, your own operating system isn’t blocking a line of attack. (Note: I’m not saying that the cloud is a bad idea; if nothing else, most cloud sysadmins are better at securing their systems than are folks at average small companies.) Email is another good use for encryption, unless you control your own mail servers. Why? Because the data is yours, but you’re storing it on someone else’s computer.
Encryption is a useful tool (and a fun research area), but like all tools it’s only useful if properly employed. If used in inappropriate situations, it won’t provide protection and will create operational headaches and perhaps data loss from mismanaged keys.
Protecting large databases like Anthem’s is a challenge. We need better software security, and we need better structural tools to isolate the really sensitive data from average, poorly protected machines. There may even be a role for encryption, but simply encrypting the social security numbers isn’t going to do much.
What Must We Trust?
My Twitter feed has exploded with the release of the Kaspersky report on the "Equation Group", an entity behind a very advanced family of malware. (Naturally, everyone is blaming the NSA. I don’t know who wrote that code, so I’ll just say it was beings from the Andromeda galaxy.)
The Equation Group has used a variety of advanced techniques, including injecting malware into disk drive firmware, planting attack code on "photo" CDs sent to conference attendees, encrypting payloads using details specific to particular target machines as the keys (which in turn implies prior knowledge of these machines’ configurations), and more. There are all sorts of implications of this report, including the policy question of whether or not the Andromedans should have risked their commercial market by doing such things. For now, though, I want to discuss one particular, deep technical question: what should a conceptual security architecture look like?
For more than 50 years, all computer security has been based on the separation between the trusted portion and the untrusted portion of the system. Once it was "kernel" (or "supervisor") versus "user" mode, on a single computer. The Orange Book recognized that the concept had to be broader, since there were all sorts of files executed or relied on by privileged portions of the system. Their newer, larger category was dubbed the "Trusted Computing Base" (TCB). When networking came along, we adopted firewalls; the TCB still existed on single computers, but we trusted "inside" computers and networks more than external ones.
There was a danger sign there, though few people recognized it: our networked systems depended on other systems for critical files. In a workstation environment, for example, the file server was crucial, but it as an entity wasn’t seen as part of the TCB. It should have been. (I used to refer to our network of Sun workstations as a single multiprocessor with a long, thin, yellow backplane—and if you’re old enough to know what a backplane was back then, you’re old enough to know why I said "yellow"…) The 1988 Internet Worm spread with very little use of privileged code; it was primarily a user-level phenomenon. The concept of the TCB didn’t seem particularly relevant. (Should sendmail have been considered as part of the TCB? It ran as root, so technically it was, but very little of it actually needed root privileges. That it had privileges was more a sign of poor modularization than of an inherent need for a mailer to be fully trusted.)
The National Academies report Trust in Cyberspace recognized that the old TCB concept no longer made sense. (Disclaimer: I was on the committee.) Too many threats, such as Word macro viruses, lived purely at user level. Obviously, one could have arbitrarily classified word processors, spreadsheets, etc., as part of the TCB, but that would have been worse than useless; these things were too large and had no need for privileges.
In the 15+ years since then, no satisfactory replacement for the TCB model has been proposed. In retrospect, the concept was not very satisfactory even when the Orange Book was new. The compiler, for example, had to be trusted, even though it was too huge to be trustworthy. (The manual page for gcc is itself about 90,000 words, almost as long as a short novel—and that’s just the man page; the code base is far larger.) The limitations have become painfully clear in recent years, with attacks demonstrated against the embedded computers in batteries, webcams, USB devices, IPMI controllers, and now disk drives. We no longer have a simple concentric trust model of firewall, TCB, kernel, firmware, hardware. Do we have to trust something? What? Where do we get these trusted objects from? How do we assure ourselves that they haven’t been tampered with?
I’m not looking for concrete answers right now. (Some of the work in secure multiparty computation suggests that we need not trust anything, if we’re willing to accept a very significant performance penalty.) Rather, I want to know how to think about the problem. Other than the now-conceputal term TCB, which has been redfined as "that stuff we have to trust, even if we don’t know what it is", we don’t even have the right words. Is there still such a thing? If so, how do we define it when we no longer recognize the perimeter of even a single computer? If not, what should replace it? We can’t make our systems Andromedan-proof if we don’t know what we need to protect against them.
Hiding in the Firmware?
The most interesting feature of the newly-described "Equation Group" attacks has been the ability to hide malware in disk drive firmware. The threat is ghastly: you can wipe the disk and reinstall the operating system, but the modified firmware in the disk controller can reinstall nasties. A common response has been to suggest that firmware shouldn’t be modifiable unless a physical switch is activated. It’s a reasonable thought, but it’s a lot harder to implement than it seems, especially for the machines of most interest to nation-state attackers.
One problem is where this switch should be. It’s easy enough on a desktop or even a laptop to have a physical switch somewhere. (I’ve read that some Chromebooks actually have such a thing.) It’s a lot harder to find a good spot on a smartphone, where space is very precious. The switch should be very difficult to operate by accident, but findable by ordinary users when needed. (This means that a switch on the bottom is probably a bad idea, since people will be turning their devices over constantly, moving between the help page that explains where the switch is and the bottom to try to find it….) There will also be the usual percentage of people who simply obey the prompts to flip the switch because of course the update they’ve just received is legitimate…
A bigger problem is that modern computers have lots of processors, each of which has its own firmware. Your keyboard has a CPU. Your network cards have CPUs. Your flash drives and SD cards have CPUs. Your laptop’s webcam has a CPU. All of these CPUs have firmware; all can be targeted by malware. And if we’re going to use a physical switch to protect them, we either need a separate switch for each device or a way for a single switch to control all of these CPUs. Doing that probably requires special signals on various internal buses, and possibly new interface standards.
The biggest problem, though, is with all of the computers that the net utterly relies on, but that most uesrs never see: the servers. Many companies have them: rows of tall racks, each filled with anonymous "pizza boxes". This is where your data lives: your email, your files, your passwords, and more. There are many of them, and they’re not updated by someone going up to each one and clicking "OK" to a Windows Update prompt. Instead, a sysadmin (probably an underpaid underappreciated, overstressed sysadmin) runs a script that will update them all, on a carefully planned schedule. Flip a switch? The data center with all of these racks may be in another state!
If you’re a techie, you’re already thinking of solutions. Perhaps we need another processor, one that would enable all sorts of things like firmware update. As it turns out, most servers already have a special management processor called IPMI (Intelligent Platform Management Interface). It would be the perfect way to control firmware updates, too, except for one thing: IPMI itself has serious security issues…
A real solution will take a few years to devise, and many more to roll out. Until then, the best hope is for Microsoft, Apple, and the various Linux distributions to really harden any interfaces that provide convenient ways for malware to issue strange commands to the disk. And that is itself a very hard problem.
Update: Dan Farmer, who has done a lot of work on IPMI security, points out that the protocol is IPMI, but the processor it runs on is the BMC (Baseboard Management Controller.
Packet Loss: How the Internet Enforces Speed Limits
There’s been a lot of controversy over the FCC’s new Network Neutrality rules. Apart from the really big issues—should there be such rules at all? Is reclassification the right way to accomplish it?—one particular point has caught the eye of network engineers everywhere: the statement that packet loss should be published as a performance metric, with the consequent implication that ISPs should strive to achieve as low a value as possible. That would be very bad thing to do. I’ll give a brief, oversimplified explanation of why; Nicholas Weaver gives more technical details.
Let’s consider a very simple case: a consumer on a phone trying to download an image-laden web page from a typical large site. There’s a big speed mismatch: the site can send much faster than the consumer can receive. What will happen? The best way to see it is by analogy.
Imagine a multiline superhighway, with an exit ramp to a low-speed local road. A lot of cars want to use that exit, but of course it can’t can’t handle as many cars, nor can they drive as fast. Traffic will start building up on the ramp, until a cop sees it and doesn’t let more cars try to exit until the backlog has cleared a bit.
Now imagine that every car is really a packet, and a car that can’t get off at that exit because the ramp is full is a dropped packet. What should you do? You could try to build a longer exit ramp, one that will hold more cars, but that only postpones the problem. What’s really necessary is a way to slow down the desired exit rate. Fortunately, on the Internet we can do that, but I have to stretch the analogy a bit further.
Let’s now assume that every car is really delivering pizza to some house. When a driver misses the exit, the pizza shop eventually notices and sends out a replacement pizza, one that’s nice and hot. That’s more like the real Internet: web sites notice dropped packets, and retransmit them. You rarely suffer any ill effects from dropped packets, other than lower throughput. But there’s a very important difference here between a smart Internet host and a pizza place: Internet hosts interpret dropped packets as a signal to slow down. That is, the more packets are dropped (or the more cars who are waved past the exit), the slower the new pizzas are sent. Eventually, the sender transmits at exactly the rate at which the exit ramp can handle the traffic. The sender may try to speed up on occasion. If the ramp can now handle the extra traffic, all is well; if not, there are more dropped packets and the sender slows down again. Trying for a zero drop rate simply leads to more congestion; it’s not sustainable. Packet drops are the only way the Internet can match sender and receiver speeds.
The reality on the Internet is far more complex, of course. I’ll mention only aspects of it; let it suffice to say that congestion on the net is in many ways worse than a traffic jam. First, you can get this sort of congestion at every "interchange". Second, it’s not just your pizzas that are slowed down, it’s all of the other "deliveries" as well.
How serious is this? The Internet was almost stillborn because this problem was not understood until the late 1980s. The network was dying of "congestion collapse" until Van Jacobson and his colleagues realized what was happening and showed how packet drops would solve the problem. It’s that simple and that important, which is why I’m putting it in bold italics: without using packet drops for speed matching, the Internet wouldn’t work at all, for anyone.
Measuring packet drops isn’t a bad idea. Using the rate, in isolation, as a net neutrality metric is not just a bad idea, it’s truly horrific. It would cause exactly the problem that the new rules are intended to solve: low throughput at inter-ISP connections.
Update: It looks like the actual rules got it right.