Wednesday, April 18, 2012

A flaw in the smart card Kerberos (PKINIT) protocol

Reading security protocols is not always fun nor easy. Protocols like public key Kerberos are hard to read because they just define the packet format and expect the reader to assume a correct message sequence. I read it, nevertheless, because I was interested on the protocol's interaction with smart cards. If you are not aware of the protocol, public key Kerberos or PKINIT is the protocol used in Microsoft Active Directory and described in RFC4556.

The idea of the protocol is to extend the traditional Kerberos, that supports only symmetric ciphers, with digital signatures and public key encryption in order to support stock smart cards. A use-case is, for example, logging in a windows domain using the smart card. The protocol itself doesn't mention smart cards at all, probably because it was thought as a deployment issue. Nevertheless, it was believed to be a secure protocol and several published papers provided proofs of security for all operational modes of the protocol.

However, the protocol has an important flaw. A flaw that makes it insecure if used with smart cards. I wrote a detailed report on the flaw, but the main idea is that if one has access for few minutes to your smart card he can login using your credentials at any time in the future. You may think that an attack like that can be prevented by never lending your smart card to anyone, but how can you prove that no-one borrowed it for a while? Or if you believe the smart card PIN would protect from theft, how could you know that the reader you are inserting your card isn't tampered? And in a protocol with smart cards you'd expect the tampered reader not to be able to use the card after you retrieve it. This is not the case, making the protocol unsuitable for smart cards, its primary use-case.


What is the most interesting issue however, are the security proofs. The protocol was proven secure, but because the protocol never mentioned smart cards, researchers proved its security on a different setting than the actual use cases. So when reading a security proof, always check the assumptions, which are as important as the proof itself.

Few things went bad with the design of this protocol, none of which is actually technical. The protocol is hard to read, and in order to get an overview of it, you have to read the whole RFC. This is just bad. If you check figure 1 in the TLS RFC you get an overview of the protocol immediately. You might not know the actual contents of the messages but the sequence is apparent. This is not possible in the Kerberos protocol, and that discourages anyone who might want to understand the protocol using a high level description of it. Another flaw, is that the protocol doesn't mention smart cards, its primary use-case. Smart cards were treated as a deployment issue and readers of the RFC, would never know about it. The latter issue is occurring in many of the IETF protocols and the readers are expected to know where and how this protocol is used. As it was demonstrated by the security proofs on a different setting, this is not the case.

So, what can it be done to mitigate the flaw? Unfortunately without modifying the protocol, the only advice that can be given is something along the:
  • make sure you always possess the card;
  • make sure you never use a tampered smart card reader.
Which may seem pretty useless in a typical working environment. What makes the attack nasty, is that if an adversary tampers your reader, and you use your card with it now, the adversary can perform a transaction a year later when you'll have no clue on what happened and might be no evidence of the tampered reader.

Sunday, April 1, 2012

TLS in embedded systems

In some embedded systems space may often be a serious constraint. However, there are many such systems that contain several megabytes of flash either as an SD memory card, or as raw NAND, having no real space constraint. For those systems using a TLS implementation such as GnuTLS or OpenSSL would provide performance gains that are not possible with the smaller implementations that target small size. That is because both of the above implementations, unlike the constraint ones, support cryptodev-linux to take advantage of cryptographic accelerators, widely present in several constraint CPUs, and support elliptic curves to optimize performance when perfect forward secrecy is required.

I happened to have an old geode (x86 compatible) CPU which contained an AES accelerator, so here are some benchmarks created using the nxweb/GnuTLS and nginx/OpenSSL web servers and the httpress utility. The figure on the right shows the data transferred per second using AES in CBC mode, with the cryptographic accelerator compared to GnuTLS' and  OpenSSL's software implementations. We can clearly see that download speed almost doubles on a big file transfer when using the accelerator.


The figure on the left shows a comparison of the various key exchange methods in this platform using GnuTLS and OpenSSL. The benchmark measures HTTPS transactions per second and the keys and parameters used are the same for both implementations. The key sizes are selected of equivalent security levels (1776 bits in RSA and DH are equivalent to 192 bits in ECDH according to ECRYPT II recommendations). We can see that the elliptic curve version of Diffie Hellman (ECDHE-RSA) allows 25% more transactions in both implementations comparing to the Diffie-Hellman on a prime field (DHE-RSA). The plain RSA key exchange remains the fastest, at the cost of sacrificing perfect forward secrecy.

As a side-note it is nice to see that at the security level of 192 bits GnuTLS outperforms OpenSSL on this processor. The trend continues on higher security levels for the RSA and DHE-RSA methods but the ECDHE-RSA method is interesting since even though OpenSSL has a more efficient elliptic curve implementation. GnuTLS' usage of nettle and GMP (which provide a faster RSA implementation) compensates, and their performance is almost identical.

Overall, in the few embedded systems that I've worked on, space was not a crucial limiting factor, and it was mainly this work that drove me into updating cryptodev for linux. In those systems the space cost occurred due to the usage of a larger library was compensated by (1) the off-loading to the crypto processor of operations that would otherwise load the CPU and (2) the reduce in processing time due to elliptic curves.
However this balance is system specific and what was important for my needs would not cover everyone elses, so it is important to weigh the advantages and disadvantages of cryptographic implementations on your system alone.

Sunday, March 18, 2012

Google summer of code

This year GnuTLS participates in the Google summer of code under the GNU project umbrella. If you are a student willing to spend this summer coding, check our ideas.

Saturday, February 18, 2012

The need for SSH-like authentication in TLS

After the Diginotar CA compromise it is apparent that verifying web sites using only a trusted certificate authority (CA) is not sufficient. Currently a web site's certificate is verified against the CA that issued it and  checked for revocation using the OCSP server the CA set up. If the CA is trusted by the user, this process protects against man-in-the-middle attacks when visiting such a web-site, and also against leakage of the web-sites private key (e.g. via OCSP as long as the leakage is reported to the CA). This is an automatic process that does not require the user to be involved, but it comes at a cost. There is a single channel for verification, the CA.

The certificate based web-site verification tries to convert the trust we have in a merchant to the digital world. That is currently done by having "few" authorities that provide certificates to the merchants and based on those certificates we should base our decisions. However, trust in trade requires more than that. For example wouldn't it raise suspicions, a laptop from a merchant who approached you in a parking lot and provided you with a legitimate looking brand card? Would his business or credentials card be the only thing to check? Those heuristics are unfortunately not currently available in the digital world.

If it wasn't for the Chrome browser that implemented a preliminary version of key pinning, the Diginotar compromise might not have been uncovered. In the maliciously-issued certificates, the automatic verification procedure was not reporting any problems. It was the change in public key that triggered the key pinning warning and eventually to the Diginotar issue becoming public. Thus having an additional verification channel to standard PKI proved to be a success.

Key pinning, as of the 01 draft, is mostly server driven. That is, the user has very little control on trusting a particular server key if the server operator doesn't ask to. Another approach is the SSH programs' typical authentication method. The trust on first use. That describes the concept where the public key of the peer is not verified, or verified out-of-bound, but subsequent connections to the same peer require the public key to remain the same. That approach has the advantage that doesn't depend on the server operator setting up anything, but also the disadvantage that the user will be bugged every time the server changes its public key.

In any case having such methods to complement standard certificate verification and revocation checks, provides an additional layer of security. With such a layer, a CA compromise would not be enough for a large-scale man-in-the-middle attack since changes in the public keys would be detected and users would be warned. Such warnings might not be entirely successful in preventing all users from continuing but would raise suspicions for the legitimacy of the server, which might be enough.

For that, we implemented in GnuTLS a framework to be used either by applications that want to use key pinning, or a trust on first use (TOFU) authentication. That consists of three helper functions that store and verify the public keys. They can be seen in the GnuTLS manual. The included program gnutls-cli has also been modified to support a hybrid authentication that includes certificate authentication, TOFU and OCSP if the --tofu and --ocsp arguments are specified. The main idea of its operation is the idea discussed here and is shown in this example code at the manual.



Sunday, January 1, 2012

Do we need elliptic curve point compression?

GnuTLS has recently added support for elliptic curves (ECDSA and Elliptic curve Diffie-Hellman). Elliptic curves are an improvement on public key technologies, mostly in efficiency because they require operations with much smaller integers for equivalent security parameters to RSA or DH. For example a 1248-bit RSA key corresponds to an 160-bit ECDSA key, saving both in space required to store parameters and in time spent for calculations.

However elliptic curve technology is considered a patent minefield and quite some attempts have been made to clarify the status, e.g. IETF made an attempt with RFC6090 to describe only the fundamental algorithms on which all patents are known to have expired. The GnuTLS implementation of ECC is based on this document.

One of the interesting "improvements" that is not included in the fundamental algorithms, and is still patented is point compression. To describe it we need a small (but not detailed) introduction to elliptic curves. An elliptic curve (at least for the needs of TLS protocol) is a set of points (x,y) that satisfy the following relation modulo a large prime p.

y2 = x3 + ax + b

This set of points has some properties, i.e., a group law allowing point addition, scalar multiplication etc. Without getting into details those points need to be exchanged during a protocol run and given that the points usually belong to a "large" prime field (e.g. 256-bits), reducing the exchanged bytes is an improvement. Point compression achieves that goal, so let's see how.

Say we have a point (x0,y0) satisfying y02 = x03 + ax0 + b. If x0 is known, then the previous relation becomes a second degree polynomial, that has two solutions.


y1,2 = ±√ x3 + ax + b 


And surprisingly that's all about point compression. Instead of sending the tuple (x0,y0), only x0 and a bit identifying the sign of y is sent. The receiver only needs to calculate y1,2 and then select the appropriate using the sign bit. Of course calculating square roots modulo a prime isn't as simple as in our example, but the idea is similar.

So is this worthwhile? Are implementations like GnuTLS that don't infringe this patent in a disadvantage? While it is a clever idea, it doesn't really add much value in real-world protocols. A 256-bit curve has points of around 32 bytes. Thus the bytes saved per TLS handshake are 64 bytes overall. Such handshake typically contains data of several kilobytes, and saving few bytes isn't going to make a dramatic difference. It wouldn't be a top priority addition even if it wasn't a patented method.

Thursday, December 15, 2011

Self-modifying code using GCC

One of my research topics last year was self-modifying code mainly for obfuscation. Having seen how self-modification is implemented for a variety of programs, I could say that most existing techniques implement the self-modification in assembly, or in the most high level case, in C using inline assembly.

I'm not a good assembly programmer so I always try to move things to "high-level" C. Unfortunately I haven't managed to implement self modification with standard C, but can be done by using an interesting GCC extension. That is the label to pointer or '&&' extension. This provides the address of a C label as a pointer. Using that we can implement self modification without the use of assembler. The idea is demonstrated in the code snippet below.

One of its problems is that if the labels are not used in the code they are removed by the GCC optimizer (even with -O0) and the address to pointer just returns a dummy value. For that the labels have to be used in dummy code. The better the optimizer in GCC becomes the harder the work around. In this test we use the value of argc (any other external value would do) to ensure the labels stay put.

The code was inspired by a self-modifying code example using inline assembly, but unfortunately I can no longer find it in order to provide proper references.

/* 
 * A self-modifying code snippet that uses C
 * and the GCC label to pointer (&&) extension.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>

int main(int argc, char **argv)
{
 int (*my_printf) (const char *format, ...);
 void (*my_exit) (int);
 void *page =
     (void *) ((unsigned long) (&&checkpoint) &
        ~(getpagesize() - 1));

 /* mark the code section we are going to overwrite
  * as writable.
  */
 mprotect(page, getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC);

 /* Use the labels to avoid having GCC
  * optimize them out */
 switch (argc) {
   case 33:
     goto checkpoint;
   case 44:
     goto newcode;
   case 55:
     goto newcode_end;
   default:
     break;
 }

 /* Replace code in checkpoint with code from
  * newcode.
  */
 memcpy(&&checkpoint, &&newcode, &&newcode_end - &&newcode);

checkpoint:
 printf("Good morning!\n");
 return 1;

newcode:
 my_printf = &printf;
 (*(my_printf)) ("Good evening\n");

 my_exit = &exit;
 (*(my_exit)) (0);

newcode_end:
 return 2;
}

Dedicated to the anonymous referee who insisted into adding this example to our article.

Monday, December 12, 2011

Generating Diffie-Hellman parameters

Starting with gnutls 3.0 the Diffie-Hellman parameter generation has been changed. That was mandated by the move from libgcrypt to nettle. Nettle didn't support Diffie-Hellman parameter generation, so I had to implement it within gnutls. For that I generate a prime of the form p=2wq+1, where w and q are also primes and q has the size in bits of the security parameter (could be 160,256 etc. bits, based on the size of p). Then I search for a generator of the q subgroup using an algorithm which typically gives a large generator --few bits less than the size of p.

This method has the advantage that a server when selecting a private key value x, instead of selecting 0 < x < p-1, it can select a smaller x within the multiplicative subgroup of order q, i.e., 0 < x < q-1. The security level of x is that of the security parameter, which in gnutls is calculated as in ECRYPT recommendations. However until now we never wrote the size of the security parameter in the Diffie-Hellman parameters file, so it was impossible for the server to guess the order of q, since the PKCS #3 file only contained the generator and the prime. However PKCS #3 has a  privateValueLength field exactly for this purpose, but it is not used by gnutls or any other implementation I'm aware of.

DHParameter ::= SEQUENCE {
  prime INTEGER, -- p
  base INTEGER, -- g
  privateValueLength INTEGER OPTIONAL 
}

By populating and using it, the performance improvement was quite impressive. The following table demonstrates that.

Prime
length
Private key
length
Transactions/sec
with DHE-RSA
12481248 122.75
1248160 189.91

So starting from 3.0.9 gnutls' generated parameters for Diffie-Hellman should perform better. However one question that I had to answer, is what is more important, keeping x small as we do, or having a small generator? Libgcrypt and openssl generate parameters in a way that the generator is kept small, e.g. g=5 or g=2. For that I generated 1248 bit parameters using openssl which happened to have g=2.

Prime
length
Generator
length
Private key
length
Transactions/sec
with DHE-RSA
12481246 bits1248 122.75
12481246 bits160 189.91
12482 bits1248 125.94

So it seems keeping the generator small doesn't really have an impact to performance comparing to using a smaller but still secure subgroup.