Thursday, April 17, 2014

software has bugs... now what?

The recent bugs uncovered in TLS/SSL implementations, were received in the blogo-sphere with a quest for the perfectly secure implementations, that have no bugs. That is the modern quest for perpetual motion. Nevertheless, very few bothered by the fact that the application's only security defence line were few TLS/SSL implementations. We design software in a way that openssl or gnutls become the Maginot line of security and when they fail they fail catastrophically.

So the question that I find more interesting, is, can we live with libraries that have bugs? Can we design resilient software that will operate despite serious bugs, and will provide us with the necessary time for a fix? In other words could an application design have mitigated or neutralized the catastrophic bugs we saw? Let's see each case, separately.


Mitigating attacks that expose memory (heartbleed)

The heartbleed attack allows an attacker to obtain a random portion of memory (there is a nice illustration in xkcd). In that attack all the data held within a server process are at risk, including user data and cryptographic keys. Could an attack of this scale be avoided?

One approach is to avoid putting all of our eggs in one basket; that's a long-time human practice, which is also used in software design. OpenSSH's privilege separation and isolation of private keys using smart cards or software security modules are two prominent examples. The defence in that design is that the unprivileged process memory contains only data that are related the current user's session, but no privileged information such as passwords or the server's private key. Could we have a similar design for an SSL server? We already have a similar design for a SSL VPN server, that revealing the worker processes' memory couldn't possibly reveal its private key. These designs come at a cost though; that is performance, as they need to rely on slow Inter-Process Communication (IPC) for basic functionality. Nevertheless, it is interesting to see whether we can re-use the good elements of these designs in existing servers.



Few years ago in a joint effort of the Computer Security and Industrial Cryptography research group of KU Leuven, and Red Hat we produced a software security module (softhsm) for the Linux-kernel, that had the purpose of preventing a server memory leak to an adversary from revealing its private keys. Unfortunately we failed to convince the kernel developers for its usefulness. That wasn't the first attempt for such a module. A user-space security module existed already and called LSM-PKCS11. Similarly to the one we proposed, that would provide access to the private key, but the operations will be performed on an isolated process (instead of the kernel). If such a module would be in place in popular TLS/SSL servers there would be no need to regenerate the server's key. So what can we do to use such a mechanism in existing software?

The previous projects are dead since quite some time, but there are newer modules like opendns's softhsm which are active. Unfortunately softhsm is not a module that enforces any type of isolation. Thus a wrapper PKCS #11 module over softhsm (or any other software HSM) that enforces process isolation between the keys and the application using it would be a good starting point. GnuTLS and NSS provide support for using PKCS #11 private keys, and adding support for this module in apache's mod_gnutls or mod_nss would be trivial. OpenSSL would still need some support for PKCS #11 modules to use it.

Note however, that such an approach would take care of the leakage of the server's private key, but would not prevent any user data to be leaked (e.g., user passwords). That of course could be handled by a similar, or even the same module (though not over the PKCS #11 API).

So if the solution is that simple, why isn't it already deployed? Well, there is always a catch; and that catch as I mentioned before is performance. A simple PKCS #11 module that enforces process isolation would introduce overhead (due to IPC), and most certainly is going to become a bottleneck. That could be unacceptable for many high-load sites, but on the other hand that could be a good driver to optimize the IPC code paths.


Mitigating an authentication issue

The question here is what could it be done for the bugs found in GnuTLS and Apple's SSL implementation, that allowed certain certificates to always succeed authentication. That's related to a PKI failure, and in fact the same defences required to mitigate a PKI failure can be used. A PKI failure is typically a CA compromise (e.g., the Diginotar issue).

One approach is again on the same lines as above, to avoid reliance on a single authentication method. That is, use two-factor authentication. Instead of relying only on PKI, combine password (or shared-key) authentication with PKI over TLS. Unfortunately, that's not as simple as a password key exchange over TLS, since that is vulnerable to eavesdropping once the PKI is compromised. To achieve two factor authentication with TLS, one can simply negotiate a session using TLS-PSK (or SRP), and renegotiate on top of it using its certificate and PKI. That way both factors are accounted, and the compromise of any of the two factors doesn't affect the other.

Of course, the above is a quite over-engineered authentication scenario, requires significant changes to today's applications, and also imposes a significant penalty. That is a penalty in performance and network communication as twice the authentication now required twice the round-trips. However, a simpler approach is to rely on trust on first use or simply SSH authentication on top of PKI. That is, use PKI to verify new keys, but follow the SSH approach with previously known keys. That way, if the PKI verification is compromised at some point, it would affect only new sessions to unknown hosts. Moreover, for an attack to be undetected the adversary is forced to operate a man-in-the-middle attack indefinitely, or if we have been under attack, a discrepancy on the server key will be detected on the first connection to the real server.


In conclusion, it seems that we can have resilient software to bugs, no matter how serious, but that software will come at a performance penalty, and would require more conservative designs. While there are applications that mainly target performance and could not always benefit from the above ideas, there are several applications that can sacrifice few extra milliseconds of processing for a more resilient to attacks design.

Sunday, November 17, 2013

Inside an SSL VPN protocol

Some time ago when trying to provide a way to interconnect the routers of the company I was working on, I attempted to evaluate the various secure VPN solutions available as free software. As I was already familiar with cryptography and secure communications protocols, I initially tried to review their design. To my surprise the most prominent secure VPN available at the time, had its source code as its documentation. My reverse engineering of the protocol showed that while SSL was claimed, it was only used as key exchange method. The actual protocol transferring packets was a custom one. This didn't align with my taste, but that was finally included in the routers as there were not many alternatives at the time.

Years after that, I was contacted by David Woodhouse, proposing changes to GnuTLS in order to support an early draft version of Datagram TLS (DTLS). Since we already had support for the final version of DTLS (i.e, 1.0), I couldn't understand the request. As it seems David was working on openconnect, a client for the CISCO AnyConnect SSL VPN protocol.

That intrigued me, as it was the first SSL VPN solution I had heard of that used Datagram TLS to transfer data. My interest increased as I learned more about openconnect, so much that I even ended-up writing the server counterpart of openconnect. Because the details of the protocol are still confined to David's head and mine, I'll attempt in this post to describe them on a higher level than source code.

Key exchange & Authentication 

The protocol is very simple in nature and is HTTP-based. Initially the client connects to the server over TLS (note that TLS runs only over TCP, something that I take as granted on the rest of this text). On that TLS session the server is authenticated using its certificate, and the client may optionally be authenticated using a certificate as well. After the TLS key exchange is complete, the client obtains an authentication page by issuing an HTTP "POST /" request containing the following.

<config-auth client="vpn" type="init">
    <version who="vpn">v5.01</version>
    <device-id>linux-64</device-id>
    <group-access>https://example.com</group-access>
</config-auth>
 
If the client did not present a certificate, or if additional information is required (e.g., an one-time password), the following takes place. The server replies with a request for the client's username that looks like:

<auth id="main">
    <message>Please enter your username</message>
    <form action="/auth" method="post">
        <input label="Username:" name="username" type="text" />
    </form>
</auth>


Which effectively contains the message to be printed to the user, as well the URL (/auth) where the string obtained by the user should be sent. The client subsequently replies with a POST containing his username.

<config-auth client="vpn" type="auth-reply">
    <version who="vpn">v5.01</version>
    <device-id>linux-64</device-id>
    <auth><username>test</username></auth>
</config-auth>

 
Once the username is received by the server, a similar conversation continues for the number of passwords that are required by the user. The message format remains essencially the same so we skip this part.

VPN tunnel establishment 

When authenticated, the client issues an HTTP CONNECT request. That effectively terminates the HTTP session and initiates a VPN tunnel over the existing TLS session. In short the client issues:

User-Agent: Open AnyConnect VPN Agent v5.01
X-CSTP-Version: 1
X-CSTP-MTU: 1280
X-CSTP-Address-Type: IPv6,IPv4
X-DTLS-Master-Secret: DAA8F66082E7661AE593 [truncated]
X-DTLS-CipherSuite: AES256-SHA:AES128-SHA:DES-CBC3-SHA
CONNECT /CSCOSSLC/tunnel HTTP/1.1

and the server replies with something that looks like the following.

HTTP/1.1 200 CONNECTED
X-CSTP-Version: 1
X-CSTP-DPD: 440
X-CSTP-Address: 192.168.1.191
X-CSTP-Netmask: 255.255.255.0
X-CSTP-DNS: 192.168.1.190
X-CSTP-Split-Include: 192.168.1.0/255.255.255.0
X-CSTP-Keepalive: 32400
X-CSTP-Rekey-Time: 115200
X-CSTP-Rekey-Method: new-tunnel
X-DTLS-Session-ID: 767a9ad8 [truncated]
X-DTLS-DPD: 440
X-DTLS-Port: 443
X-DTLS-Rekey-Time: 115200
X-DTLS-Keepalive: 32400
X-DTLS-CipherSuite: AES128-SHA
X-DTLS-MTU: 1214
X-CSTP-MTU: 1214

This completes the HTTP authentication phase of the protocol. At this point a VPN tunnel is established over TLS, and the client obtains the IP address present in the "X-CSTP-Address" header, and adds the "X-CSTP-Split-Include" routes to its routing table. The IP packets read from the local TUN device are sent via the tunnel to the peer with an 8-byte prefix, that allows distinguishing IP data from various VPN packets such as keep-alive or dead-peer-detection.

It is, however, well known that TCP over TCP is far from being optimal, and for this reason the server provides the option to the client for an additional tunnel over UDP and DTLS.


VPN tunnel over UDP

To initiate the Datagram TLS over UDP session the client sends the "X-DTLS-Master-Secret" and "X-DTLS-CipherSuite" headers at its CONNECT request (see above). The former contains the key to be used as the pre-master key, in TLS terminology, and the latter contains a list of ciphersuites as read by OpenSSL. The server replies on these requests by accepting a ciphersuite and presenting it in its "X-DTLS-CipherSuite" header, adding the headers such as "X-DTLS-Port" and "X-DTLS-Session-ID" and others less relevant for this description.

At the receipt of that information the client initiates a Datagram TLS session (using a draft version of DTLS 1.0) on the port indicated by the server. A good question at this point is how is the server associating the new DTLS session request with this particular client. The hint here is that the client copies the value in the "X-DTLS-Session-ID" header to its Session ID field of the DTLS Client Hello. That session is in effect handled as a session resumption that uses the "X-DTLS-Master-Secret" as pre-master secret, the previous session ID and the ciphersuite negotiated in the "X-DTLS-CipherSuite" header (presumably that hackish approach exists because this solution pre-dates TLS with preshared keys).

That completes the DTLS negotiation over UDP, and the establishment of the second VPN tunnel. From this point this tunnel is being used as primary, and if for some reason it goes down the traffic is redirected to the backup tunnel over TLS. A difference of the DTLS tunnel with the TLS one, is that the VPN packet header is a single byte instead of 8. The smaller header is an important save since DTLS packets are restricted by the link MTU which is further reduced by the headers of IP, UDP, and DTLS.

Rekeying 

DTLS allows the transfer up to 248 packets (or 220 petabytes - assuming an average of 800 bytes per packet) in a single session. To allow for even larger transfers and to refresh the keys used the protocol enforces re-keying by time as indicated by the "X-DTLS-Rekey-Time". At the moment openconnect implements that by tearing up both the TLS and DTLS tunnels and reconnecting to the server.

Reconnections

Reconnections in this protocol, e.g., because of the client switching networks and changing IP, are handled on the HTTP level. That is the client accepts a cookie by the server and uses it on any subsequent connection.

Data transfer


In the VPN tunnel establishment we show the negotiation of the ciphersuite used for the DTLS tunnel and actual data transfer. Due to the protocol's restriction to pre-DTLS 1.0 the available ciphersuite options are the following three:
  • AES256-CBC-SHA
  • AES128-CBC-SHA
  • 3DES-CBC-SHA
3DES-CBC is a performance nightmare, so for any practical purposes AES128-CBC and AES256-CBC are being used. However, all of these ciphersuites are vulnerable to the padding oracle attacks, and in have considerable header overhead (they require a full-block random IV per packet, padding and have a quite long MAC of 20 bytes). These inefficiencies were the main incentive for the salsa20 proposal in TLS.

Overview

Overall the protocol looks hackish for someone reading it today on a high level. It is however, the closest to standard's based VPN protocol that is around. It has some weaknesses (vulnerable to padding oracle attacks) and limitations as well (for example it cannot easily use any other version than the pre-draft DTLS), and they can be easily be fixed, but let's leave that for a future post.

Thursday, May 16, 2013

Salsa20 and UMAC in TLS

Lately while I was implementing and deploying an SSL VPN server, I realized  that even for a peer-to-peer connections the resources taken for encryption on the two ARM systems I used were quite excessive. These ARM processors do not have instructions to speed-up AES and SHA1, and were spending most of their resources to encrypt and authenticate the exchanged packets.

What can be done in such a case? The SSL VPN server utilized DTLS which runs over UDP and restricts the packet size to the path MTU size (typically 1400 bytes if we want to avoid fragmentation and reassembly), thus wastes quite some resources on packetization of long data. Since the packet size cannot be modified we could possibly improve the encryption and authentication speed.  Unfortunately using a more lightweight cipher available in TLS, such as RC4, is not an option as it is not available in DTLS (while TLS and DTLS mostly share the same set of ciphersuites, some ciphers like RC4 due to constraints cannot be used in DTLS). Overall, we cannot do much with the currently defined algorithms in DTLS, we need to move outside the TLS protocol box.

Some time ago there was an EU-sponsored competition on stream ciphers (which are typically characterized by their performance) and Salsa20, one of the winners, was recently added in nettle (the library GnuTLS uses) by Simon Josefsson who conceived the idea of such a fast stream cipher being added to TLS. While modifying GnuTLS to take advantage of Salsa20, I also considered moving away from HMAC (the slow message authentication mechanism TLS uses) and use the UMAC construction which provides a security proof and impressive performance. My initial attempt to port the UMAC reference code (which was not ideal code), motivated the author of nettle, Niels Moeller, to reimplement UMAC in a cleaner way. As such Salsa20 with UMAC is now included in nettle and are used by GnuTLS 3.2.0. The results are quite impressive.

Salsa20 with UMAC96 ciphersuites were 2-3 times faster than any AES variant used in TLS, and outperformed even RC4-SHA1, the fastest ciphersuite defined in the TLS protocol. The results as seen on an Intel i3 are shown below (they are reproducible using gnutls-cli --benchmark-tls-ciphers). Note that SHA1 in the ciphersuite name means HMAC-SHA1 and Salsa20/12 is the variant of Salsa20 that was among the eStream competition winners.

Performance on 1400-byte packets
CiphersuiteMbyte/sec
SALSA20-UMAC96107.82
SALSA20-SHA168.97
SALSA20/12-UMAC96130.13
SALSA20/12-SHA177.01
AES-128-CBC-SHA144.70
AES-128-GCM44.33
RC4-SHA161.14

The results as seen on the openconnect VPN performance on two PCs, connected over a 100-Mbit ethernet, are as follows.

Performance of a VPN transfer over ethernet
CiphersuiteMbits/secCPU load (top)
None (plain transfer)948%
SALSA20/12-UMAC968957%
AES-128-CBC-SHA18676%

While the performance difference of SALSA20 and AES-128-CBC isn't impressive (AES was already not too low), the difference in the load of the server CPU is significant.

Would such ciphersuites be also useful to a wider set of applications than VPN? I believe the answer is positive, and not only for performance reasons. This year new attacks were devised on AES-128-CBC-SHA1 and RC4-SHA1 ciphersuites in TLS that cannot be easily worked around. For AES-128-CBC-SHA1 there are some hacks that reduce the impact of the known attacks, but they are hacks not a solution. As such TLS will benefit from a new set of ciphersuites that replace the old ones with known issues. Moreover, even if we consider RC4 as a viable solution today (which is not), the DTLS protocol cannot take advantage of it, and datagram applications such as VPNs need to rely on the much slower AES-128-GCM.

So we see several advantages in this new list of ciphersuites and for that, with Simon Josefsson and Joachim Strombergson we plan to propose to the IETF TLS Working Group the adoption of a set of Salsa20-based ciphersuites. We were asked by the WG chairs to present our work in the IETF 87 meeting in Berlin. For that I plan to travel to the meeting in Berlin to present our current Internet-Draft.

For that, if you support defining these ciphersuites in TLS, we need your help. If you are an IETF participant please join the TLS Working Group meeting and indicate your support. Also if you have any feedback on the approach or suggest another field of work that this could be useful please drop me a mail or leave a comment below mentioning your name and any association.

Moreover, as it is now, a lightning trip to Berlin on these dates would cost at minimum 800 euros including the IETF single day registration. As this is not part of our day job any contribution that would help to partially cover those expenses is welcome.

Tuesday, March 26, 2013

The perils of LGPLv3

LGPLv3 is the latest version of the GNU Lesser General Public License. It follows the successful LGPLv2.1 license, and was released by Free Software Foundation as a counterpart to its GNU General Public License version 3.

The goal of the GNU Lesser General Public Licenses is to provide software that can be used by both proprietary and free software. This goal has been successfully handled so far by LGPLv2.1, and there is a multitude of libraries using that license. Now we have LGPLv3 as the latest, and the question is how successful is LGPLv3 on this goal?

In my opinion, very little. If we assume that its primary goal is to be used by free software, then it blatantly fails that. LGPLv3 has serious issues when used with free software, and especially with the GNU GPL version 2. Projects under the GPLv2 license violate its terms if they use an LGPLv3 library (because LGPLv3 adds additional restrictions).

What does FSF suggest on that case? It suggests upgrading GPLv2 projects to GPLv3. That's a viable solution, if you actually can and want to upgrade to GPLv3. At the GnuTLS project, after we switched to LGPLv3, we realized that in practice we forced all of our GPLv2 (or later) users to distribute their binaries under GPLv3. Moreover, we also realized that several GPLv2-only projects (i.e., projects that did not have the GPLv2 or later clause in their license), could no longer use the library at all.

The same incompatibility issue exists with LGPLv2.1 projects that want to use an LGPLv3 library. They must be upgraded to LGPLv3.

In discussions within FSF, Stallman had suggested using the dual license LGPLv3 or GPLv2, in libraries to overcome these issues. That although it does not solve the issue with the LGPLv2.1 libraries, is not a bad suggestion, but it has a major drawback. The projects under the dual LGPLv3 or GPLv2, cannot re-use code from other LGPLv3 projects nor from GPLv2 projects, creating a rather awkward situation for the project.

So it seems we have a "lesser" kind of license for use by libraries, that mandates free software project authors the license they should release their projects with. I find that unacceptable for such a license. It seems to me that with this license, FSF just asks you not to use LGPLv3 for your libraries.

So ok, LGPLv3 has issues with free software licenses... but how does it work with proprietary projects? Surprisingly it works better than with free software licenses; it doesn't require them to change their license.

Nevertheless there is a catch. My understanding of LGPLv3 (it is one of the hardest licenses to read - as it requires you to read first the GPLv3 remove some of its clauses and then add some additional ones) is that it contains the anti-tivoization clause of GPLv3. That is, in a home appliance, if the creator of the appliance has an upgrade mechanism, he has to provide a way to upgrade the library. That doesn't really make sense to me, neither as a consumer, nor as someone who potentially creates software for appliances.

As a consumer, why would I consider the ability to only upgrade the (say) libz library on my router a major advantage? You may say that I have additional privileges (more freedom) on my router. It could be, but my concern is that this option will hardly ever happen.  Will an appliance creator chose a "lesser" GPL library when this clause is present? Will he spend the additional time to create a special upgrade mechanism for a single library to satisfy the LGPLv3 license, or will he chose a non-LGPLv3 license? (remember that the goal of the LGPL licenses according to FSF is to use them in software where proprietary, or free of charge alternatives exist)


So overall, it seems to me that LGPLv3 has so many practical issues that actually make its advantages (e.g. patent clauses) seem irrelevant, and I don't plan to use it as a license of my libraries. I'm back to the good old LGPLv2.1.

Tuesday, February 5, 2013

Time is money (in CBC ciphersuites)

While protocols are not always nicely written, deviating from them has a big disadvantage. You cannot blame someone else if there is a problem. It has a small advantage though, you avoid monoculture and an attack that is applicable to the protocol may not applicable to you. What happened in the following story is something in between.

But let's start from the beginning. Few months ago I received a paper from Nadhem Alfardan and Kenny Paterson about a new timing attack on TLS CBC ciphersuites. That paper got on my low priority queue since I'm not that impressed by timing attacks. They are nicely done on the lab, but fall short in the field. Nevertheless at some point this attack got my attention because it claimed it could recover few bits of the plaintext when using GnuTLS over ethernet.

Let's see what is the actual scenario though. In order for the attack to work the client must operate as follows. It connects to a server, it sends some data (which will be encrypted), the attacker will intercept them, and terminate the client's connection abnormally. The client will then reconnect and resend the same data, again and again.

That is not the typical scenario of TLS, but it is not so unrealistic either (think of a script that does that). The main idea of the attack is that the described attacker can distinguish between different padding values of TLS messages using the available timing information between receipt of a message and server reply (which may just be the TCP tear down messages).

How is that done with GnuTLS? The attacker takes the input ciphertext and forms a message of 20 AES blocks. Then takes the part of the message he's interested at and copies the matching block and its predecessor as the last two encrypted blocks (which contain the MAC and padding).

Then on every different user connection it XORs the last byte of the penultimate block with a byte (say Delta to be consistent with the paper) of his choice. On every client reconnection he repeats that using a different Delta and waits for the server response (a TLS server notifies the client of decryption failure).

In the paper they plot a diagram that shows that certain values of Delta require different time to receive a reply. I re-implemented the attack for verification, and the measurements on the server can be seen in the following diagram. Note that the last plaintext byte is zero (also note that some attack results in the paper are due to a bug the authors discovered which is now fixed, the discussion here refers to the fixed version of GnuTLS).
Figure 1. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 or earlier (view on the server)
It is clear that different values of Delta cause different response time. What we see in these timing results is the differences due to the sizes of the data being applied to the hash algorithm. In GnuTLS the bigger the Delta, the larger the pad (remember that the last byte of plaintext was zero), and less time is spent in hashing data.

The small increase in processing time seen on every block, is the correctness check of the pad (which takes more time as the pad increases).  In that particular CPU I used, the differences per block seem to be around 200 nanoseconds, and the difference between the slower and the fastest block is around 1 microsecond.

The question is, can an attacker notice such small differences over ethernet? In the paper the authors claim yes (and present some nice figures). I repeated the test, not over the ethernet, but over unix domain sockets, i.e., only delays due to kernel processing are added. Let's see the result.

Figure 2. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 or earlier (attacker view)

Although it is more noisy the pattern is still visible. Could we avoid that pattern from being visible? Let's first follow the RFC padding removal precisely and check again.

Figure 1. Median server timings for AES-CBC-SHA1, for a varying Delta, on GnuTLS 3.1.6 with TLS conformant padding check applied (view on the server)

That's much better. Obvious patterns disappeared. However, there is no happy end yet. In the same paper the authors present another attack on OpenSSL which uses the above padding method. Surprisingly that attack can recover more data than before (but at a slower pace). That's the Full Plaintext recovery attack on the paper and it uses the fact that TLS 1.2 suggests to assume 0 bytes of padding. Zero bytes of padding is an invalid value (padding in TLS is from 1 to 256 bytes) and thus cannot be set by a legitimate message.

This can be exploited and then invalid padding can be distinguished from valid padding, by selecting message size in a way that if 0 is used as pad, the additional byte that will be hashed will result to a full block processed by the hash algorithm. That is we would expect that if a correct pad is guessed less processing would occur. Let's visualize it on an example.

Figure 4. Median server timings for AES-CBC-SHA1, for a varying Delta, on a TLS compliant implementation

Notice at the lonely point on the bottom left. We can see that when Delta is zero the processing is 200ms faster. For the selected message, the zero Delta resulted to a correct pad. How to fix that? That is pretty tricky.

In order to fix this pattern the code that is doing the CBC pad removal has to be aware of the internal block sizes in the hash, as well as any internal padding used by the hash. In a typical layered implementation (or at least in GnuTLS) that isn't easy. The internal hash block size wasn't available, because one shouldn't need to know that. The paper suggests a fix, that assumes a block size of 64, which is correct for all the HMAC algorithms in TLS, but that isn't future proof (e.g. SHA3 hasn't got the same block size).

So what to do there? The fix for GnuTLS is now similar to the hack described in the paper. The hash block size and knowledge of the internal padding are taken into account to achieve a pretty uniform processing time (see below). Any differences are now into the level of 10's of nanoseconds.
Figure 4. Median server timings for AES-CBC-SHA1, for a varying Delta, after work-around is applied


Now we're very close to a happy end. What is important though, is to prevent a similar attack from occurring next year. This isn't a new attack, even the TLS 1.1 protocol acknowledges that timing channel, but does nothing to solve it. TLS implementations now need to have 2-3 pages of code just to remove the CBC padding, and that makes clear that the TLS spec is broken and needs to be updated.

Together with Alfredo Pironti we plan to propose proposed a new padding mechanism which one of its targets is to eliminate that CBC padding issue (the other target is to allow arbitrary padding to any ciphersuite). The way the timing channels in CBC padding are eliminated, is by considering pad as part of the transferred data (i.e., it is properly authenticated). As such the pad check code does not provide any side-channel information because it operates only after the MAC is verified.

PS. Kudos go to Oscar Reparaz for long discussions on side channel attacks.

PS2. I should note that Nadhem and Kenny were very kind to notify me of the bugs and the attack. While this may seem like the obvious thing to do, it is not that common with other academic authors (I'm very tempted to place a link here).


Monday, October 8, 2012

Some thoughts on the DANE protocol

A while ago I was writing on why we need an alternative authentication method in TLS. Then I described the SSH-style authentication and how it was implemented it GnuTLS. Another approach is the DANE protocol. It uses the hierarchic model of DNSSEC to provide an alternative authentication method in TLS.

DNSSEC uses a hierarchical model similar to a single root CA that is delegating CA responsibilities to each individual domain holder for its domain. In DANE a DNSSEC-enabled domain may sign entries for each TLS server in the domain. The entries contain information, such as the hash, of the TLS server's certificate. That's a nice idea and can be used instead, or in addition to the existing commercial CA verification. Let's see two examples that demonstrate DANE. Let's suppose we have an HTTPS server called www.example.com and a CA.

A typical web client after it successfully verifies www.example.com's certificate using the CA's certificate, will check the DANE entries. If they match it's assured that a CA compromise does not affect its current connection. That is, in this scenario, an attacker needs not only to compromise the example.com's CA, but also to compromise the server's DNS domain service.

Another significant example is when there is no CA at all. The server www.example.com is a low-budget one and wants to avoid paying any commercial CA. If it has DNSSEC set up then it just advertises its key on the DNS and clients can verify its certificate using the DNSSEC signature. That is the trust is moved from the commercial CAs to the DNS administrators. Whether they can cope with that, will be seen in time.

Even though the whole idea is attractive, the actual protocol is (IMO) quite bloated. On a recent post in the DANE mailing list I summarized several issues (note that I was wrong on the first). The most important I believe is the fact that DANE separates the certificates to CA signed certificates and to self-signed certificates. That is if you have a CA signed certificate you mark it in the DNS entry as 1, while if you have a self signed one you mark it as 3. The reasoning behind this is unclear but it's effect is that it is harder to move from the CA signed to non-CA signed world or vice-versa. That is if you have a CA signed certificate and it expires, the DANE entry automatically becomes invalid as well. This may be considered a good property for some (especially the CAs), but I see no much point in that. It is also impossible for the server administrator to know whether a client trusts its CA, thus this certificate distinction doesn't always make sense. A work-around is to always have your certificates marked as self-signed and CA signed in two different DNS entries.

Despite this and few other shortcomings that I worry of, this is the best protocol we have so far to divide the trust for TLS certificates verification to more entities than the commercial CAs. For this reason, a companion library implementing DANE will be included in the upcoming GnuTLS 3.1.3 release.

Thursday, August 16, 2012

Using the Trusted Platform Module to protect your keys

There was a big hype when the Trusted Platform Module (TPM) was introduced into computers. Briefly it is a co-processor in your PC that allows it to perform calculations independently of the main processor. This has good and bad side-effects. In this post we focus on the good ones, which are the fact that you can use it to perform cryptographic operations the same way as in a smart-card. What does that mean? It simply means that you can have RSA keys in your TPM chip that you can use them to sign and/or decrypt but you cannot extract them. That way a compromised web server doesn't necessarily mean a compromised private key.

GnuTLS 3.1.0 (when compiled with libtrousers) adds support for keys stored in the TPM chip. This support is transparent, and such keys can be used similarly to keys stored in files. What is required is that TPM keys are specified using a URI of the following forms.

tpmkey:uuid=c0208efa-8fe3-431a-9e0b-b8923bb0cdc4;storage=system
tpmkey:file=/path/to/the/tpmkey


The first URI contains a UUID which is an identifier of the key, and the storage area of the chip (TPM allows for system and user keys). The latter URI is used for TPM keys that are stored outside the TPM storage area, i.e., in an (encrypted by the TPM) file.

Let's see how we can generate a TPM key, and use it for TLS authentication. We'll need to generate a key and the corresponding certificate. The following command generates a key which will be stored in the TPM's user section.

$ tpmtool --generate-rsa --bits 2048 --register --user


The output of the command is the key ID.

tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user

So now that we have the ID of the key, let's extract the public key from it.

$ tpmtool --pubkey "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --outfile=pubkey.pem

And given the public key we can easily generate a certificate using the following command.
$ certtool --generate-certificate --outfile cert.pem --load-privkey "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --load-pubkey pubkey.pem --load-ca-certificate ca-cert.pem --load-ca-privkey ca-key.pem

The generated certificate can now be used with any program using the gnutls library, such as gnutls-cli to connect to a server. For example:
$ gnutls-cli --x509keyfile "tpmkey:uuid=58ad734b-bde6-45c7-89d8-756a55ad1891;storage=user" --x509certfile cert.pem -p 443 my_host.name

An easy to notice issue with TPM keys is that they are not mnemonic. There is only an UUID identifying the key, but no labels making the distinction of multiple keys a troublesome task. Nevertheless, TPM keys provide a cheap way to protect keys used in your system.