Sunday, January 1, 2012

Do we need elliptic curve point compression?

GnuTLS has recently added support for elliptic curves (ECDSA and Elliptic curve Diffie-Hellman). Elliptic curves are an improvement on public key technologies, mostly in efficiency because they require operations with much smaller integers for equivalent security parameters to RSA or DH. For example a 1248-bit RSA key corresponds to an 160-bit ECDSA key, saving both in space required to store parameters and in time spent for calculations.

However elliptic curve technology is considered a patent minefield and quite some attempts have been made to clarify the status, e.g. IETF made an attempt with RFC6090 to describe only the fundamental algorithms on which all patents are known to have expired. The GnuTLS implementation of ECC is based on this document.

One of the interesting "improvements" that is not included in the fundamental algorithms, and is still patented is point compression. To describe it we need a small (but not detailed) introduction to elliptic curves. An elliptic curve (at least for the needs of TLS protocol) is a set of points (x,y) that satisfy the following relation modulo a large prime p.

y2 = x3 + ax + b

This set of points has some properties, i.e., a group law allowing point addition, scalar multiplication etc. Without getting into details those points need to be exchanged during a protocol run and given that the points usually belong to a "large" prime field (e.g. 256-bits), reducing the exchanged bytes is an improvement. Point compression achieves that goal, so let's see how.

Say we have a point (x0,y0) satisfying y02 = x03 + ax0 + b. If x0 is known, then the previous relation becomes a second degree polynomial, that has two solutions.


y1,2 = ±√ x3 + ax + b 


And surprisingly that's all about point compression. Instead of sending the tuple (x0,y0), only x0 and a bit identifying the sign of y is sent. The receiver only needs to calculate y1,2 and then select the appropriate using the sign bit. Of course calculating square roots modulo a prime isn't as simple as in our example, but the idea is similar.

So is this worthwhile? Are implementations like GnuTLS that don't infringe this patent in a disadvantage? While it is a clever idea, it doesn't really add much value in real-world protocols. A 256-bit curve has points of around 32 bytes. Thus the bytes saved per TLS handshake are 64 bytes overall. Such handshake typically contains data of several kilobytes, and saving few bytes isn't going to make a dramatic difference. It wouldn't be a top priority addition even if it wasn't a patented method.

Thursday, December 15, 2011

Self-modifying code using GCC

One of my research topics last year was self-modifying code mainly for obfuscation. Having seen how self-modification is implemented for a variety of programs, I could say that most existing techniques implement the self-modification in assembly, or in the most high level case, in C using inline assembly.

I'm not a good assembly programmer so I always try to move things to "high-level" C. Unfortunately I haven't managed to implement self modification with standard C, but can be done by using an interesting GCC extension. That is the label to pointer or '&&' extension. This provides the address of a C label as a pointer. Using that we can implement self modification without the use of assembler. The idea is demonstrated in the code snippet below.

One of its problems is that if the labels are not used in the code they are removed by the GCC optimizer (even with -O0) and the address to pointer just returns a dummy value. For that the labels have to be used in dummy code. The better the optimizer in GCC becomes the harder the work around. In this test we use the value of argc (any other external value would do) to ensure the labels stay put.

The code was inspired by a self-modifying code example using inline assembly, but unfortunately I can no longer find it in order to provide proper references.

/* 
 * A self-modifying code snippet that uses C
 * and the GCC label to pointer (&&) extension.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>

int main(int argc, char **argv)
{
 int (*my_printf) (const char *format, ...);
 void (*my_exit) (int);
 void *page =
     (void *) ((unsigned long) (&&checkpoint) &
        ~(getpagesize() - 1));

 /* mark the code section we are going to overwrite
  * as writable.
  */
 mprotect(page, getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC);

 /* Use the labels to avoid having GCC
  * optimize them out */
 switch (argc) {
   case 33:
     goto checkpoint;
   case 44:
     goto newcode;
   case 55:
     goto newcode_end;
   default:
     break;
 }

 /* Replace code in checkpoint with code from
  * newcode.
  */
 memcpy(&&checkpoint, &&newcode, &&newcode_end - &&newcode);

checkpoint:
 printf("Good morning!\n");
 return 1;

newcode:
 my_printf = &printf;
 (*(my_printf)) ("Good evening\n");

 my_exit = &exit;
 (*(my_exit)) (0);

newcode_end:
 return 2;
}

Dedicated to the anonymous referee who insisted into adding this example to our article.

Monday, December 12, 2011

Generating Diffie-Hellman parameters

Starting with gnutls 3.0 the Diffie-Hellman parameter generation has been changed. That was mandated by the move from libgcrypt to nettle. Nettle didn't support Diffie-Hellman parameter generation, so I had to implement it within gnutls. For that I generate a prime of the form p=2wq+1, where w and q are also primes and q has the size in bits of the security parameter (could be 160,256 etc. bits, based on the size of p). Then I search for a generator of the q subgroup using an algorithm which typically gives a large generator --few bits less than the size of p.

This method has the advantage that a server when selecting a private key value x, instead of selecting 0 < x < p-1, it can select a smaller x within the multiplicative subgroup of order q, i.e., 0 < x < q-1. The security level of x is that of the security parameter, which in gnutls is calculated as in ECRYPT recommendations. However until now we never wrote the size of the security parameter in the Diffie-Hellman parameters file, so it was impossible for the server to guess the order of q, since the PKCS #3 file only contained the generator and the prime. However PKCS #3 has a  privateValueLength field exactly for this purpose, but it is not used by gnutls or any other implementation I'm aware of.

DHParameter ::= SEQUENCE {
  prime INTEGER, -- p
  base INTEGER, -- g
  privateValueLength INTEGER OPTIONAL 
}

By populating and using it, the performance improvement was quite impressive. The following table demonstrates that.

Prime
length
Private key
length
Transactions/sec
with DHE-RSA
12481248 122.75
1248160 189.91

So starting from 3.0.9 gnutls' generated parameters for Diffie-Hellman should perform better. However one question that I had to answer, is what is more important, keeping x small as we do, or having a small generator? Libgcrypt and openssl generate parameters in a way that the generator is kept small, e.g. g=5 or g=2. For that I generated 1248 bit parameters using openssl which happened to have g=2.

Prime
length
Generator
length
Private key
length
Transactions/sec
with DHE-RSA
12481246 bits1248 122.75
12481246 bits160 189.91
12482 bits1248 125.94

So it seems keeping the generator small doesn't really have an impact to performance comparing to using a smaller but still secure subgroup.

Thursday, December 8, 2011

The price to pay for perfect-forward secrecy

[EDIT: This article was written in 2011 and reflects the facts at the time; fortunately, since then the state of the Internet has improved, and PFS ciphersuites have become the norm]

Ok, the question seems to be what is perfect forward secrecy? Perfect forward secrecy (PFS) is a property of a cryptographic protocol, such as SSL (or better TLS). That property ensures that once the cryptographic keys of a participant are leaked the secrecy of past sessions remains. For example if the amazon.com private key is leaked your previous transactions with this site remain secure. That is a very interesting protocol property, but is actually only effective if the web site wouldn't store any of the sensitive transaction data. In this post we ignore any issues due to storage and focus on the protocol property. So does TLS have this property? That is a yes and a no.

TLS provides the option to use ciphersuites offering perfect forward secrecy. Those are the ciphersuites that use ephemeral Diffie-Hellman and elliptic curve Diffie-Hellman authentication. However those come at a cost very few web sites are willing to pay for. They involve expensive calculations that put some burden on the server and delay the transaction for few milliseconds. Why would a server bother to protect your data anyway? And that seems to be the current attitude. Most servers totally prohibit the usage of ciphersuites that offer PFS. Try for example to connect to amazon.com using:
$ ./gnutls-cli www.amazon.com --priority NORMAL:-KX-ALL:+DHE-RSA:+ECDHE-RSA
Resolving 'www.amazon.com'...
Connecting to '72.21.214.128:443'...
*** Fatal error: A TLS fatal alert has been received.
*** Received alert [40]: Handshake failed

So let's try to evaluate the cost of PFS versus the plain RSA ciphersuites that do not offer PFS, using a simple approach initially. For that we use Diffie-Hellman group parameters of 1024 bits, a 192-bit elliptic curve and a 1024-bit RSA key and a modified gnutls-cli for its benchmark-tls option.


Key exchangeParametersTransactions/sec
DHE-RSA1024-bit RSA key,
1024-bit DH parameters
345.53
ECDHE-RSA1024-bit RSA key,
192-bit ECDH parameters
604.92
ECDHE-ECDSA192-bit ECDSA key,
192-bit ECDH parameters
595.84
RSA1024-bit RSA key994.59

In this test the ciphersuites DHE-RSA, ECDHE-RSA and ECDHE-ECDSA provide perfect forward secrecy, and use a signed ephemeral (Diffie-Hellman) key exchange. The signature in the key exchange is an RSA or ECDSA one, determined by the ciphersuite and the certificate. The plain RSA ciphersuite doesn't offer PFS and utilizes RSA's encryption capability (instead of signing). We can see that the non-PFS ciphersuite is a clear winner with 3x factor from DHE_RSA ciphersuite. The elliptic curve equivalents of DHE also fail to reach plain RSA's performance.

However, our security levels are not consistent. An elliptic curve of 192-bits provides 96-bits of security  according to ECRYPT (don't confuse the bits of an algorithm such as RSA with the bits of the security level, a security level of 96-bits is a good level of security for today's standards). That means it is equivalent to roughly 1776-bits RSA and DH parameters. So let's do the same experiment with more consistent parameters.


Key exchangeParametersTransactions/sec
DHE-RSA1776-bit RSA key,
1776-bit DH parameters
98.26
ECDHE-RSA1776-bit RSA key,
192-bit ECDH parameters
352.41
ECDHE-ECDSA192-bit ECDSA key,
192-bit ECDH parameters
595.84
RSA1776-bit RSA key460.08

So it is obvious that increasing the security level to 96-bits degrades performance for all ciphersuites (except ECDHE-ECDSA which already offered that level of security). The Diffie-Hellman key exchange is very slow over a 1776-bit group, and in addition it includes an RSA signature of 1776-bits. The Elliptic curve equivalent (ECDHE-RSA) is much more efficient, even though it also includes a 1776-bit RSA signature. The clear winner though is the ECDHE-ECDSA variant which uses an ECDSA public key and outperforms even plain RSA.

Nevertheless, if we restrict to RSA keys that are supported by almost every implementation, the winner is plain RSA, with a small but not insignificant difference from the second.

So although I'd suggest ciphersuites offering PFS for almost everything, in practice, when ECDSA keys are not option, they degrade performance, and this should not be underestimated. When it is desired to have PFS, then its shortcomings have to be evaluated against the benefits.

On the other hand if a decent security level is required (such as the 96-bit one used in our example), we see that the switch from plain RSA to ECDHE-ECDSA provides both efficiency and forward secrecy.

So given the fact that any web server may at some point be compromised, my suggestion would be, that if you value the exchanged data in the transactions, only PFS ciphersuites should be used. Otherwise, using the plain RSA ciphersuite may suit better.

PS. Note that protecting the server's private key on the disk using PIN or passwords is futile as the key resides unencrypted in the server's memory.

Thursday, November 24, 2011

Enhancing privacy in comments

FHEO or "For human eyes only" is a Firefox browser plugin with the goal of enhancing privacy in comments made in social networks or forums. The idea is to counter  privacy issues seen mainly in social networks, by using (distorted) images instead of plain text. The two issues it mainly defends against:
  1. Expiration of comments;
  2. Prevention of comments being indexed by web crawlers.
Most social networks do not allow deleting old posts or comments, and if they do it is not an automatic process. Moreover those comments can be easily  gathered by a crawler such as google and be used to associate persons based on written comments, political ideas etc.

This is something my colleague Andreas Pashalidis didn't like and proposed a master thesis on a firefox plugin that would counter those issues. This thesis was completed by Xavi Ferrer and BeƱat Bermejo under our supervision and resulted to a firefox plugin prototype. Later the plugin and distortions have been enhanced by Andreas (and to a tiny degree me), and now a fully functional plugin is available under the Apache License 2.0.

We'd appreciate your comments and ideas.

Tuesday, September 27, 2011

Ovrimos was a greek company making an RDBMS. They tried to compete with giants like Oracle and microsoft SQL, but failed. Having the guts to try was enough for me. They are part of history right now and I'm pretty proud I was a tiny part of it. The history of the company is nicely described by one of the lead engineers at that time.

Wednesday, September 7, 2011

The problem of cryptographic algorithms & cryptographic accelerators

There are many ways one could use cryptography in GNU/Linux. There are cryptographic libraries such as the Java crypto API, Botan, OpenSSL, GnuTLS and nettle,  that provide access to crypto algorithms. The Linux kernel has also its own cryptographic API to be accessed by the in-kernel IPSec implementation. They all offer a variety of algorithms and each one has its pros and cons. Some implementations might be faster, some slower but such variation is expected.

New generation CPUs contain instruction sets for cryptographic operations, such are VIA's PADLOCK and Intel's AES-NI. Other embedded systems provide a cryptographic accelerator allowing offloading of cryptography. An interesting question is how do the above cryptographic libraries take advantage of the above optimizations?

Cryptographic operations based on the PADLOCK or AES-NI instruction sets require libraries to re-write their AES (for example) implementation using the above instruction sets. They both have special requirements for data and key alignment requiring special care on implementation. Moreover since those instruction sets are not available in all CPUs of the same family, run-time CPU detection has to occur which will enable the optimized versions.

The cryptographic accelerators case is more complicated. Cryptographic libraries cannot access the accelerators without a driver. Until recently there was no generic driver to access them in the Linux kernel. Today we have two. The AF_ALG interface, which is included in the Linux kernel since 2.6.38 and the cryptodev-linux interface which is an interface compatible with OpenBSD's /dev/crypto. Both allow access to the hardware accelerated algorithms included in the Linux kernel. Moreover, in CPUs having the PADLOCK or AES-NI instruction set the Linux kernel includes and uses optimized versions of the crypto algorithms. This allows the kernel interface to be used as a generic interface to access cryptographic accelerators and optimized instruction sets.

However, performance-wise this is not optimal.  The kernel interface requires switches from user-space to kernel-space which has a significant performance cost. Thus the usage of the kernel services should be avoided when possible (i.e., in the AES-NI and padlock instruction sets).

So there is no universal interface to take advantage of hardware-accelerated cryptography. But is there one needed? Why have a universal interface to access hardware acceleration when there isn't any to access cryptography?

I believe a universal interface to access hardware accelerated cryptography is needed because not having it involves a massive duplication of work, that is architecture and system specific. Such an interface would allow the crypto libraries to stay architecture or system agnostic and focus on the bigger picture rather than coping with architecture quirks, hardware bug work-arounds etc. Moreover, cryptographic libraries compete on the interface, protocols etc. Libraries competing on hardware support although typical for the 80's and 90's shouldn't be the case in a  modern operating system.

How can this be solved? I believe a system-wide library to access accelerated crypto would be the ideal. One that would use the kernel provided cryptography in case of a hardware accelerator, or the relevant CPU-specific instructions if detected. Each cryptographic library could then use it, to provide accelerated cryptography transparently to all applications.