Why the US Doesn't have Chip-and-PIN Credit Cards Yet
In the wake of the Target security breach, there’s been a fair amount of hand-wringing about why the US has lagged most of the rest of the world in deploying EMV (Europay, MasterCard and Visa)—chips—in credit cards. While I certainly think that American banks and card issuers should have moved sooner, they had their reasons for their decision. Arguably, they were even correct.
To understand the actual logic, it is necessary to remember three things:
- The bankers who decided to stick with mag stripes are most certainly not stupid.
- More security versus less security is not a moral issue, it’s a financial decision.
- Security technologies have their own costs, some of which are not obvious.
Security mechanisms are not selected randomly. Rather, they’re deployed to counter specific threats. If you don’t see a threat that would be countered by EMV, there’s no point to using it. One major source of loss—fraud on card applications—is not addressed at all by EMV. Forged cards have long been an issue, but on a small scale; this could be dealt with by things like holograms and quick revocation. Yes, datatbases of credit card numbers existed, but for the most part these weren’t at risk; there was little, if any, network connectivity, and the criminal hacker community had developed the tools to get at these databases.
(Those databases of card numbers turned out to be very important. Most people use a very few credit cards, often just one; that means that your credit card number is effectively your customer ID number. You behavior can be (and is) tracked this way, especially if you buy both online and in a physical store.)
Quick revocation, implemented when merchants started deploying terminals a bit over 30 years ago, was very important. Before that, stores relied on books listing canceled card numbers. These books were issued at most weekly, and were cumbersome to use; as a result, they were generally consulted only for large transactions. (Exercise for the reader: at the conclusion of this blog post, explain why this behavior was quite rational.)
In other words, by around 1995, life was pretty good for American credit card accepters. There was a relatively cheap technology (mag stripes plus online verification), good databases for tracking, and decent law enforcement.
Life was different in Europe. Countries are much smaller, of course, which means that there’s more cross-border travel; this in turn hinders law enforcement for cross-border crime. (Not very many Americans travel abroad, so there’s not nearly as much of a cross-border issue affecting American banks.) For whatever reason, online verification terminals were not deployed as widely, but crime was increasing. (I’ve heard that different costs for telecommunications service and equipment played a big role, but I haven’t verified that.) There had a second mover advantage: they hadn’t invested as much in mag stripe technology, and in the meantime smart cards—chips in credit cards—had become feasible and affordable, which was not true circa 1980.
One element of the cost, then, is the infrastructure: the myriad terminals that merchants own, and the server complexes that accept and verify those transactions. None of that would work with EMV cards. Other costs, though, are more subtle. In one ironic example, Target itself tried deploying EMV ten years ago: Target was both an issuer and accepter of credit cards. It turned out, though, that processing a transaction with an EMV card is slower, which meant long lines at cash registers— lines that their competitors didn’t have, because almost no one else in the US was using EMV.
This, then, was the problem: high conversion costs, high operational costs, disadvantages for early adopters, and little consumer demand for the chips—American consumers aren’t responsible for fraudulent use, and as noted few Americans travel abroad where they might need the chips. Combine this with the lack of a significant threat model, and the decision seemed obvious: the financial calculations indicated that it wasn’t a profitable move. Yes, there would be some loss due to preventable fraud, but the cost of that prevention would be greater than the likely losses. As noted above, fraud prevention is strictly a financial decision.
What happened, of course, was that the threat changed dramatically. Hackers did learn to penetrate store server complexes and card processors. The conversion is now an urgent matter, but it will still be years before most transactions in the US will involve chip-enabled cards and terminals.
Goto Fail
As you’ve probably heard by now, there’s a serious bug in the TLS implementations in iOS (the iPhone and iPad operating system) and MacOS. I’ll skip the details (but see Adam Langley’s excellent blog post if you’re interested); the effect is that under certain conditions, an attacker can sit in the middle of an encrypted connection and read all of the traffic.
Here’s the code in question:
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail; …Note the doubled
goto fail; goto fail;That’s incorrect and is at the root of the problem.
The mystery is how that extra line was inserted. Was it malicious or was it an accident? If the latter, how could it possibly have happened? Is there any conceivable way this could have happened? Well, yes.
Here’s a scan from a 1971 paperback printing of Roger Zelazny’s Lord of Light.
Note the duplicated paragraph… Today, something like that could easily happen via a cut-and-paste error. (The error in that book was not authorial emphasis; I once checked a hardcover copy that did not have the duplication.)
There’s another reason to think it was an accident: it’s not very subtle. That sequence would stick out like a sore thumb to any programmer who looked at it; there are no situations where two goto statements in a row make any sense whatsoever. In fact, it’s a bit of a surprise that the compiler didn’t flag it as an error; the ordinary process of optimization should have noticed that the all of the lines after second goto fail could never be reached. The attempted sabotage of the Linux kernel in 2003 was much harder to spot; you’d need to notice the difference between = and == in a context where any programmer "just knows" it should be ==. However…
There are still a few suspicious items. Most notably, this bug is triggered if and only if something called "perfect forward secrecy" (PFS) has been offered as an encryption option for the session. Omitting the technical details of how it works, PFS is a problem for attackers—including government attackers—who have stolen or cryptanalyzed a site’s private key. Without PFS, knowing the private key allows encrypted traffic to be read; with PFS, you can’t do that. This flaw would allow an attacker who was capable of carrying out a "man-in-the-middle" (MitM) attack to read even PFS-protected traffic. While MitM attacks have been described in the academic literature at least as long ago as 1995, they’re a lot less common than ordinary hacks and are more detectable. In other words, if this was a deliberate back door, it was probably done by a high-end attacker, someone who wants to carry out MitM attacks and has the ability to sabotage Apple’s code base.
On the gripping hand, the error is noticeable by anyone poking at the file, and it’s one of the pieces of source code that Apple publishes, which means it’s not a great choice for covert action by the NSA or Unit 61398. With luck, Apple will investigate this and announce what they’ve found.
There are two other interesting questions here: why this wasn’t caught during testing, and why the iOS fix was released before the MacOS fix.
The second one is easy: changes to MacOS require different "system tests". That is, testing just that one function’s behavior in isolation is straightforward: set up a test case and see what happens. It’s not enough just to test the failure case, of course; you also have to test all of the correct cases you can manage. Still, that’s simple enough. The problem is "system test": making sure that all of the rest of the system behaves properly with the new code in. MacOS has very different applications than iOS does (to name just one, the mailer is very, very different); testing on one says little about what will happen on the other. For that matter, they have to create new tests on both platforms that will detect this case, and make sure that old code bases don’t break on the new test case. (Software developers use something called "regression testing" to make sure that new changes don’t break old code.) I’m not particularly surprised by the delay, though I suspect that Apple was caught by surprise by how rapidly the fix was reverse-engineered.
The real question, though, is why they didn’t catch the bug in the first place. It’s a truism in the business that testing can only show the presence of bugs, not the absence. No matter how much you test, you can’t possibly test for all possible combinations of inputs that can result to try to find a failure; it’s combinatorially impossible. I’m sure that Apple tested for this class of failure—talking to the wrong host—but they couldn’t and didn’t test for this in every possible variant of how the encryption takes place. The TLS protocol is exceedingly complex, with many different possibilities for how the encryption is set up; as noted, this particular flaw is only triggered if PFS is in use. There are many other possible paths through the code. Should this particular one have been tested more thoroughly? I suspect so, because it’s a different way that a connection failure could occur; not having been on the inside of Apple’s test team, though, I can’t say for sure how they decided.
There’s a more troubling part of the analysis, though. One very basic item for testers is something called a "code coverage" tool: it shows what parts of the system have or have not been executed during tests. While coverage alone doesn’t suffice, it is a necessary condition; code that hasn’t been executed at all during tests has never been tested. In this case, the code after the second goto was never executed. That could and should have been spotted. That is wasn’t does not speak well of Apple’s test teams. (Note to outraged testers: yes, I’m oversimplifying. There are things that are very hard to do in system test, like handling some hardware error conditions.) Of course, if you want to put on an extra-strength tinfoil hat, perhaps the same attackers who inserted that line of code deleted the test for the condition, but there is zero evidence for that.
So what’s the bottom line? There is a serious bug of unknown etiology in iOS and MacOS. As of this writing, Apple has fixed one but not the other. It may have been an accident; if it was enemy action, it was fairly clumsy. We can hope that Apple will announce the results of its investigation and review its test procedures.
Speculation About Goto Fail
Following the logic in my previous post, I don’t think that Apple’s goto fail was a deliberate attack. Suppose it was, though. What can we learn about the attacker?
The first point is that it very clearly was not the NSA or other high-end intelligence agency. As I noted, this is too visible and too clumsy. While they may not object to that, covertly tinkering with Apple source code is a difficult and risky operation. If an investigation by Apple shows that this was an attack, they’ll move heaven and earth to close the hole, whether it was technical, personnel, or procedural. In other words, using some covert access channel to install a back door effectively "spends" it; you may not get to reuse the channel. In that case, they definitely would not use this one-shot on something that is so easily spotted. (What could they have done? There are lots of random numbers lying around in cryptographic protocols; leak the session key—more likely, a part of it—in one of those numbers. Use convoluted code to generate this "random" number, complete with misleading comments.)
The next question is how this capability can be used. The vulnerability requires a so-called "man-in-the-middle" (MitM) attack, where an attacker has to receive all messages in each direction, decrypt them, reencrypt them, and forward them to the proper destination. If you’re intercepting a lot of traffic, that’s a lot of work. There are a number of ways to do MitM attacks; for technical reasons, it’s a lot easier on the client or the server’s LAN, or with the cooperation of either’s ISP. There are certainly other ways to do it, such as DNS spoofing or routing attacks; that’s why we need DNSSEC and BGPSEC. But ARP-spoofing from on-LAN is very, very easy.
We can narrow it down still further. If the main thing of interest is email and in particular email passwords, the odds are that the victim will be using one of the big "cloud" providers: Google, Yahoo, or Microsoft. You don’t want to tap those nets near the servers; apart from the technical difficulty (they’re good at running their machines securely, and they don’t invite random strangers onto their nets), and apart from the fact that you’d be trying to take a sip from a firehose, it’s hard to figure out where to put the tap. Consider gmail. I checked its IP address from my office, my house, and from a server I sometimes use in Seattle. The IP addresses resolved to data centers near DC, New York City, and Seattle, respectively. Which one should you use if you wanted my traffic? Note in particular that though my office is in New York City and my house is not, I got a New York server from when trying from home, but not when trying from my office.
(There’s another possible attacker vector: software updates. Apple’s updates are digitally signed, so they can’t be tampered with by an attacker; however, that isn’t true for all third-party packages. Tampering with an update is a way to get malicious code installed on someone’s machine.)
The conclusion is simple: go after the client LAN. If it’s a LAN the attacker and the target both have access to—say, an Internet cafe or the like—the problem is very simple. In fact, if you’re on-LAN with the target, you can home in on the target’s MAC address (which doesn’t go off-LAN), making your life very simple. Alternatively, hack into the wireless router or seek cooperation from the hotspot owner or the ISP. (Home routers are easily hacked, too, of course. If the attacker goes that route, there’s no need for MAC-spoofing.)
We can also speculate that the victim is using an iOS device rather than a Mac. If nothing else, Apple sells far more iPhones and iPads than it does Macs. There’s another reason to think that, though: Macs are expensive, and Internet cafes are much more popular in poorer countries where fewer people have broadband access at home. Of course, if they’re using an iPhone in cellular mode, there’s no LAN to camp on, but governments have little trouble gaining access to the networks run by their own telephone companies. Either way, it sounds very much like a targeted attack, aimed at a very few individuals.
My reasoning is, of course, highly speculative. If it was an attack, though, I conclude (based on this tenuous set of deductions) that it was a moderately capable government going after a small set of victims, either on a public net or via the local mobile phone carrier. Most of us had nothing to worry about—until, of course, the patch came out. And why a patch for just iOS, with MacOS waiting until later? Apart from the system test difficulties I mentioned last time, might it be because Apple was warned—by someone!—that this hole was being exploited in the wild? Again, I’m speculating…
Update: Nicholas Weaver notes that putting the exploit station on the victim’s LAN or router solves another problem: identifying which machines may be vulnerable to this exploit. It can look at other traffic from that MAC address to decide if it’s an Apple product, what browser it’s using, etc., and not tip off the victim by pulling this stunt against Firefox or Internet Explorer users.
In principle, this can be done from off-LAN, but it’s harder, especially if the hosts use strong sequence numbers and randomize the IPid field. (Some open source operating systems do that. I haven’t checked on Macs or Windows boxes in a long time.)