Monday, June 15, 2015

Software Isolation in Linux

Starting from the assumption that software will always have bugs, we need a way to isolate and neutralize the effects of the bugs. One approach is by isolating components of the software such that a breach in one component doesn't compromise another. That, in the software engineering field became popular with the principle of privilege separation used by the openssh server, and is the application of an old idea to a new field. Isolation for protection of assets is  widespread in several other aspects of human activity; the most prominent example throughout history is the process of quarantine which isolates suspected carriers of epidemic diseases to protect the rest of the population. In this post, we briefly overview the tools available in the Linux kernel to offer isolation between software components. It is based on my FOSDEM 2015 presentation on software isolation in security devroom, slightly enhanced.

Note that this text is intended to developers and software engineers looking for methods to isolate components in their software.

An important aspect of such an analysis is to clarify the threats that it targets. In this text I describe methods which protect against code injection attacks. Code injection provides the attacker a very strong tool to execute code, read arbitrary memory, etc.

In a typical Linux kernel based system, the available tools we can use for such protection, are the following.
  1. fork() + setuid() + exec()
  2. chroot()
  3. seccomp()
  4. prctl()
  5. SELinux
  6. Namespaces

The first allows for memory isolation by using different processes, ensuring that a forked process has no access to parent's memory address space. The second allows a process to restrict itself in a part of the file system available in the operating system. These two are available in almost all POSIX systems, and are the oldest tools available dating to the first UNIX releases. The focus of this post are the methods 3-6 which are fairly recent and Linux kernel specific.

Before proceeding, let's make a brief overview of what is a process in an UNIX-like operating system. It is some code in memory which is scheduled to have a time slot in the CPU. It can access the virtual memory assigned to it, make calculations using the CPU user-mode instructions, and that's pretty much all. To access anything else, e.g., get additional memory, access files, read/write the memory of other processes, system calls from the operating system have to be used.

Let's now proceed and describe the available isolation methods.

Seccomp

After that introduction to processes, seccomp comes as the natural method to list, since it is essentially a filter for the system calls available in a process. For example, a process can install a filter to allow read() and write() but nothing else. After such a filter applied, any code which is potentially injected to that process will not be able to execute any other system calls, reducing the attack impact to the allowed calls. In our particular example with read() and write() only the data written and read by that process will be affected.

The simplest way to access seccomp is via the libseccomp library which has a quite intuitive API. An example using that library, which creates a whitelist of three system calls is shown below.

    
    #include <seccomp.h>

    scmp_filter_ctx ctx;
    ctx = seccomp_init(SCMP_ACT_ERRNO(EPERM))
    assert(ctx == 0);
    assert(seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0) == 0);
    assert(seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0) == 0);
    assert (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(ioctl), 1, SCMP_A1(SCMP_CMP_EQ, (int)SIOCGIFMTU)) == 0);
    assert (seccomp_load(ctx) == 0);

The example above installs a filter which allows the read(), write() and ioctl() system calls. The latter is only allowed if the second argument of ioctl() is SIOCGIFMTU. The first line of filter setup, instructs seccomp to return -1 and set errno to EPERM when an instruction outside this filter is called.

The drawback of seccomp is the tedious process which a developer has to go in order to figure out the used system calls in his application. Manual inspection of the code is required to discover the system calls, as well as inspection of run traces using the 'strace' tool. Manual inspection will allow a making a rough list which will provide a starting point, but will not be entirely accurate. The issue is that calls to libc functions may not correspond to the expected system call. For example, a call exit() often results to an exit_group() system call.

The trace obtained using 'strace' will help clarify, restrict or extend the initial list. Note however, that using traces alone may prevent getting the system calls used in error condition handling, and different versions of libc may use different system calls for the same function. For example, the libc call select() uses the system call select() in x86-64 architecture, but the _newselect() in the x86.


The performance cost of seccomp is the cost of executing the filter, and for most cases it is a fixed cost per system call. In the openconnect VPN server I estimated the impact of seccomp on a worker process to 2% slowdown in transfer speed (from 624 to 607Mbps). The measured server worker process is executing read/send and recv/write in a tight loop.

Prctl

The PR_SET_DUMPABLE flag of the prctl() system call protects a process from other processes accessing its memory. That is, it will prevent processes with the same privilege as the protected to read its memory it via ptrace(). A usage example is shown below.

    #include <prctl.h>

    prctl(PR_SET_DUMPABLE, 0);

While this approach doesn't protect against code injection in general, it may prove a useful tool with low-performance cost in several setups.

SELinux

SELinux is an operating system mechanism to prevent processes from accessing various "objects" of the OS. The objects can be files, other processes, pipes, network interfaces etc. It may be used to enhance seccomp using more fine-grained access control. For example one may setup a rule with seccomp to only allow read(), and enhance the rule with SELinux for read() to only accept certain file descriptors.

On the other hand, a software engineer may not be able to rely too much on  SELinux to provide isolation, because it is often not within the developer's control. It is typically used as an administrative tool, and the administrator may decide to turn it off, set it to non-enforcing mode, etc.

The way for a process to transition to a different SELinux ruleset is via exec() or via the setcon() call, and its cost is perceived to be high. However, I have no performance tests with software relying on it.

The drawbacks of this approach are the centralized nature of the system policy, meaning that individual applications can only apply the existing system policy, not update it, the obscure language (m4) a system policy needs to be written at, and the fact that any policy written will not be portable across different Linux based systems.

Linux Namespaces

One of the most recent additions to the Linux kernel are the Namespaces feature. These allow "virtualizing" certain Linux kernel subsystems in processes, and the result is often referred to as containers. It is available via the clone() and unshare() system calls. It is documented in the unshare(2) and clone(2) man pages, but let's see some examples of the subsystems they can be restricted.

  • NEWPID: Prevents processes to "see" and access process IDs (PIDs) outside their namespace. That is the first isolated process will believe it has PID 1 and see only the processes it has forked.
  • NEWIPC: Prevents processes to access the main IPC subsystem (shared memory segments, messages queues etc.). The processes will have access to their own IPC subsystem.
  • NEWNS: Provides filesystem isolation, in a way as a feature rich chroot(). It allows for example to create isolated mount points which exist only within a process.
  • NEWNET: Isolates processes from the main network subsystem. That is it provides them with a separate networking stack, device interfaces, routing tables etc.
Let's see an example of an isolated process being created on its own PID namespace. The following code operates as fork() would do.

    #if defined(__i386__) || defined(__arm__) || defined(__x86_64__) || defined(__mips__)
        long ret;
        int flags = SIGCHLD|CLONE_NEWPID;
        ret = syscall(SYS_clone, flags, 0, 0, 0);
        if (ret == 0 && syscall(SYS_getpid) != 1)
                return -1;
        return ret;
    #endif

This approach, of course has a performance penalty in the time needed to create a new process. In my experiments with openconnect VPN server the time to create a process with NEWPID, NEWNET and NEWIPC flags increased the process creation time to 10 times more than a call to fork().

Note however, that the isolation subsystems available in clone() are by default reversible using the setns() system call. To ensure that these subsystems remain isolated even after code injection seccomp must be used to eliminate calls setns() (many thanks to the FOSDEM participant who brought that to my attention).

Furthermore, the approach of Namespaces follows the principle of blacklisting, allowing a developer to isolate from certain subsystems but not from every one available in the system. That is, one may enable all the isolation subsystems available in the clone() call, but then he may realize that the kernel keyring is still available to the isolated process. That is because there is no implementation of such an isolation mechanism for the kernel keyring so far.

Conclusions


In the following table I attempt to summarize the protection offerings of each of the described methods.


Prevent killing
other processes
Prevent access to memory
of other processes
Prevent access to
shared memory
Prevent exploitation
of an unused system call bug
Seccomp True True True True
prctl(SET_DUMPABLE) False True False False
SELinux True True True False
Namespaces True True True False

In my opinion, seccomp seems to be the best option to consider as an isolation mechanism when designing new software. Together with Linux Namespaces it can prevent access to other processes, shared memory and filesystem, but the main distinguisher I see, is the fact that it can restrict access to unused system calls.

That is an important point, given the number of available system calls in a modern kernel. It is not only that it reduces the overall attack surface by limiting them, but it will also deny access to functionality that was not intended to --see setns() and the kernel keyring issue above.

Wednesday, December 3, 2014

A quick overview of GnuTLS development in 2014

2014 was a very interesting year in the development of GnuTLS. On the development side, this year we have incorporated patches with fixes or enhanced functionality from more than 25 people according to openhub, and the main focus was moving GnuTLS 3.3.x from next to a stable release. That version had quite a number of new features, the most prominent being:
  • Initialization on a library constructor,
  • The introduction of verification profiles, i.e., enforce rules not only on the session parameters such as ciphersuites and protocol version, but also on the peer's certificate (e.g., only accept a session which provides a 2048 bit certificate),
  • The addition of support for DNS name constraints in certificates, i.e., enforce rules that restrict an intermediate CA to issue certificates only for a specific DNS domain,
  • The addition of support for trust modules using PKCS #11,
  • Better integration with PKCS #11 smart cards, and Hardware Security Modules (HSMs); I believe we currently have one of the better, if not the best, support and integration with smart cards and HSMs among  crypto libraries. That we mostly own to the support of PKCS#11 URLs, which allowed the conversion of high level APIs which accepted files, to API that would work with HSMs and smart cards, transparently, without changing it,
  • Support for the required by FIPS140-2 algorithms.
Most of these are described in the original announcement mail, some others were completed and added gradually.

A long-time requested feature that was implemented was the removal of the need for explicit library initialization. That relocated the previously explicit call of gnutls_global_init() to a library constructor, and eliminated a cumbersome requirement the previous versions of GnuTLS had. As a result it removed the need for locks and coordination in a large application which may use multiple libraries that depend on GnuTLS. However it had an unexpected side-effect. Because initialization was moved to a constructor, which is executed prior to an application entering main(), server applications which closed all the file descriptors on startup, did close the file descriptor that was held by GnuTLS for reading /dev/urandom. That's a pretty nasty bug, and was realized because in CUPS the descriptor which replaced the /dev/urandom one, was a non-readable file descriptor. It was solved re-opening the descriptor on the first call of gnutls_global_init(), but that issue demonstrates the advantage and simplicity of the system call approach for the random generator, i.e., getentropy() or getrandom(). In fact adding support for getentropy() reduced the complexity of the relevant code to 8 lines from 75 lines of code in the file-based approach.

An other significant addition, at least for client-side, is the support for trust modules, available using PKCS #11. Trust modules, like p11-kit trust, allow clients to perform server certificate authentication using a method very similar to what the NSS library uses. That is, a system-wide database of certificates, such as the p11-kit trust module, can be used to query for the trusted CAs and intermediate anchors. These anchors may have private attached extensions, e.g., restrict the name constraints for a CA, or restrict the scope of a CA to code signing, etc., in addition to, or in place of the scope set by the CA itself. That has very important implications for system-wide CA management. CAs can be limited on scope, or per application, and can even be restricted to few DNS top-level domains using name constraints. Most importantly, the database can be shared across libraries (in fact in Fedora 21 the trust database is shared between GnuTLS and NSS), simplifying significantly administration, even though the tools to manage it are primitive for the moment. That was a long time vision of Stef Walter, who develops p11-kit, and I'm pretty happy the GnuTLS part of it  was completed, quite well.

This was also the first year I attempted to document and publicize the goals for the next GnuTLS release which is 3.4.0 and will be released around next March. They can be seen in our gitorious wiki pages. Hopefully, all the points will be delivered, although some of the few remaining points rely on a new release of the nettle crypto library, and on an update of c-ares to support DNSSEC.

Dependency-wise, GnuTLS is moving to a tighter integration with nettle, which provides one of the fastest ECC implementations out there; I find unfortunate, though, that there seem to be no intention of collaboration between the two GNU crypto libraries - nettle and libgcrypt, meaning that optimizations in libgcrypt stay in libgcrypt and the same for nettle. GNU Libtasn1 is seeking for a co-maintainer (or even better for someone to rethink and redesign the project I'd add), so feel free to step up if you're up to the job. As it is now, I'm taking care of any fixes needed for that project.

On the other hand, we also had quite serious security vulnerabilities fixed that year. These varied from certificate verification issues to a heap overflow. A quick categorization of the serious issues fixed this year follows.
  • Issues discovered with manual auditing (GNUTLS-SA-2014-1, GNUTLS-SA-2014-2). These were certificate verification issues and were discovered as part of manual auditing of the code. Thanks to Suman Jana, and also to Red Hat which provided me the time for the audit.
  • Issues discovered by Fuzzers (GNUTLS-SA-2014-3, GNUTLS-SA-2014-5):
    • Codenomicon  provided a few month valid copy of their test suite, and that helped to discover the first bug above,
    • Sean Buford of Google reported to me that using AFL he identified a heap corruption issue in the encoding of ECC parameters; I have not used AFL but it seems quite an impressive piece of software,
  • Other/Protocol issues (GNUTLS-SA-2014-4), i.e., POODLE. Although, that is not really a GnuTLS or TLS protocol issue, POODLE takes advantage of the downgrade dance, used in several applications.
Even though the sample is small, and only accounts for the issues with an immediate security impact, I believe that this view of the issues, shows the importance of fuzzers today. In a project of 180k lines of code, it is not feasible to fully rely on manual auditing, except for some crucial parts of it. While there are no security vulnerabilities discovered via static analysis tools like clang or coverity, we had quite a few other fixes because of these tools.

So overall, 2014 was a quite productive year, both in the number of new features being added, and to the number of bugs fixed. New techniques were used to verify the source code, and new test cases were added in our test suite. What is encouraging is that more and more people test, report bugs and provide patches on GnuTLS. A big thank you to all of the people who contributed.

Wednesday, October 15, 2014

What about POODLE?

Yesterday POODLE was announced, a fancy named new attack on the SSL 3.0 protocol, which relies on applications using a non-standard fallback mechanism, typically found in browsers. The attack takes advantage of
  • a vulnerability in the CBC mode in SSL 3.0 which is known since a decade
  • a non-standard fallback mechanism (often called as downgrade dance)
So the novel and crucial part of the attack is the exploitation of the non-standard fallback mechanism.What is that, you may ask. I'll try to explain it in the next paragraph. Note that in the next paragraphs I'll use the term SSL protocol to cover TLS as well, since TLS is simply a newer version of SSL.

The SSL protocol, has a protocol negotiation mechanism that wouldn't allow a fallback to SSL 3.0 from clients and servers that both support a newer variant (e.g, TLS 1.1). That detects modifications by man-in-the-middle attackers and the POODLE attack would have been thwarted. However, a limited set of clients, perform a custom protocol fallback, the downgrade dance, which is straightforward but insecure. That set of clients seem to be most of the browsers; those in order to negotiate an acceptable TLS version follow something along the lines:
  1. Connect using the highest SSL version (e.g., TLS 1.2)
  2. If that fails set the version to be TLS 1.1 and reconnect
  3. ...
  4. until there are no options and SSL 3.0 is used.
That's a non-standard way to negotiate TLS and as the POODLE attack demonstrates, it is insecure. Any attacker can interrupt the first connection and make it seem like failure to force a fallback to a weaker protocol. The good news is, that mostly browsers use this construct, and few other applications should be affected.

Why do browsers use this construct then? To their defence, there have been serious bugs in SSL and TLS standard protocol fallback implementation, in widespread software. For example, when TLS 1.2 was out, we realized that our TLS 1.2-enabled client in GnuTLS couldn't connect to a large part of the internet. Few large sites would refuse to talk to the GnuTLS client because it advertised TLS 1.2 as its highest supported protocol. The bug was on the server, that closed the connection when encountered with a newer protocol than its own, instead of negotiating its highest supported (in accordance with the TLS protocol). It took few years before TLS 1.2 was enabled by default in GnuTLS, and still then we had a hard time convincing our users that encountered connection failures, that it was a server bug. The truth is that users don't care who's bug it is, they will simply use software that just works.

There has been long time since then (TLS 1.2 was published in 2008), and today almost all public servers follow the TLS protocol negotiation. So that may be the time for browsers to get rid of that relic of the past. Unfortunately, that isn't the case. The IETF TLS working group is now trying to standardize counter-measures for the browser negotiation trickery. Even though I have become more pragmatist since 2008, I believe that forcing counter measures in every TLS implementation just because there used to (or may still be) be broken servers on the Internet, not only prolongs the life of an insecure out of protocol work-around, but creates a waste. That is, it creates a code dump called TLS protocol implementations which get filled with hacks and work-arounds, just because of few broken implementations. As Florian Weimer puts it, all applications pay a tax of extra code, potentially introducing new bugs, and even more scary potentially introducing more compatibility issues, just because some servers on the Internet have chosen not to follow the protocol.

Are there, however, any counter-measures that one can use to avoid the attack, without introducing an additional fallback signalling mechanism? As previously mentioned, if you are using the SSL protocol the recommended way, no work around is needed, you are safe. If for any reason you want to use the insecure non-standard protocol negotiation, make sure that no insecure protocols like SSL 3.0 are in the negotiated set or if disabling SSL 3.0 isn't an option, ensure that it is only allowed when negotiated as a fallback (e.g., offer TLS 1.0 + SSL 3.0, and only then accept SSL 3.0).

In any case, that attack has provided the incentive to remove SSL 3.0 from public servers on the Internet. Given that, and its known vulnerabilities, it will no longer be included by default in the upcoming GnuTLS 3.4.0.

[Last update 2014-10-20]

PS. There are some recommendations to work around the issue by using RC4 instead of a block cipher in SSL 3.0. That would defeat the current attack and it closes a door, by opening another; RC4 is a broken cipher and there are known attacks which recover plaintext for it.

Thursday, April 17, 2014

software has bugs... now what?

The recent bugs uncovered in TLS/SSL implementations, were received in the blogo-sphere with a quest for the perfectly secure implementations, that have no bugs. That is the modern quest for perpetual motion. Nevertheless, very few bothered by the fact that the application's only security defence line were few TLS/SSL implementations. We design software in a way that openssl or gnutls become the Maginot line of security and when they fail they fail catastrophically.

So the question that I find more interesting, is, can we live with libraries that have bugs? Can we design resilient software that will operate despite serious bugs, and will provide us with the necessary time for a fix? In other words could an application design have mitigated or neutralized the catastrophic bugs we saw? Let's see each case, separately.


Mitigating attacks that expose memory (heartbleed)

The heartbleed attack allows an attacker to obtain a random portion of memory (there is a nice illustration in xkcd). In that attack all the data held within a server process are at risk, including user data and cryptographic keys. Could an attack of this scale be avoided?

One approach is to avoid putting all of our eggs in one basket; that's a long-time human practice, which is also used in software design. OpenSSH's privilege separation and isolation of private keys using smart cards or software security modules are two prominent examples. The defence in that design is that the unprivileged process memory contains only data that are related the current user's session, but no privileged information such as passwords or the server's private key. Could we have a similar design for an SSL server? We already have a similar design for an SSL VPN server, that revealing the worker processes' memory couldn't possibly reveal its private key. These designs come at a cost though; that is performance, as they need to rely on slow Inter-Process Communication (IPC) for basic functionality. Nevertheless, it is interesting to see whether we can re-use the good elements of these designs in existing servers.



Few years ago in a joint effort of the Computer Security and Industrial Cryptography research group of KU Leuven, and Red Hat we produced a software security module (softhsm) for the Linux-kernel, that had the purpose of preventing a server memory leak to an adversary from revealing its private keys. Unfortunately we failed to convince the kernel developers for its usefulness. That wasn't the first attempt for such a module. A user-space security module existed already and called LSM-PKCS11. Similarly to the one we proposed, that would provide access to the private key, but the operations will be performed on an isolated process (instead of the kernel). If such a module would be in place in popular TLS/SSL servers there would be no need to regenerate the server's certificates after a heartbleed-type of attack. So what can we do to use such a mechanism in existing software?

The previous projects are dead since quite some time, but there are newer modules like opendns's softhsm which are active. Unfortunately softhsm is not a module that enforces any type of isolation. Thus a wrapper PKCS #11 module over softhsm (or any other software HSM) that enforces process isolation between the keys and the application using it would be a good starting point. GnuTLS and NSS provide support for using PKCS #11 private keys, and adding support for this module in apache's mod_gnutls or mod_nss would be trivial. OpenSSL would still need some support for PKCS #11 modules to use it.

Note however, that such an approach would take care of the leakage of the server's private key, but would not prevent any user data to be leaked (e.g., user passwords). That of course could be handled by a similar isolation approach on the web server, or even the same module (though not over the PKCS #11 API).

So if the solution is that simple, why isn't it already deployed? Well, there is always a catch; and that catch as I mentioned before is performance. A simple PKCS #11 module that enforces process isolation would introduce overhead (due to IPC), and most certainly is going to become a bottleneck. That could be unacceptable for many high-load sites, but on the other hand that could be a good driver to optimize the IPC code paths.


Mitigating an authentication issue

The question here is what could it be done for the bugs found in GnuTLS and Apple's SSL implementation, that allowed certain certificates to always succeed authentication. That's related to a PKI failure, and in fact the same defences required to mitigate a PKI failure, can be used. A PKI failure is typically a CA compromise (e.g., the Diginotar issue).

One approach is again on the same lines as above, to avoid reliance on a single authentication method. That is, use two-factor authentication. Instead of relying only on PKI, combine password (or shared-key) authentication with PKI over TLS. Unfortunately, that's not as simple as a password key exchange over TLS, since that is vulnerable to eavesdropping once the PKI is compromised. To achieve two factor authentication with TLS, one can simply negotiate a session using TLS-PSK (or SRP), and renegotiate on top of it using its certificate and PKI. That way both factors are accounted, and the compromise of any of the two factors doesn't affect the other.

Of course, the above is a quite over-engineered authentication scenario, requires significant changes to today's applications, and also imposes a significant penalty. That is a penalty in performance and network communication as twice the authentication now required twice the round-trips. However, a simpler approach is to rely on trust on first use or simply SSH authentication on top of PKI. That is, use PKI to verify new keys, but follow the SSH approach with previously known keys. That way, if the PKI verification is compromised at some point, it would affect only new sessions to unknown hosts. Moreover, for an attack to be undetected the adversary is forced to operate a man-in-the-middle attack indefinitely, or if we have been under attack, a discrepancy on the server key will be detected on the first connection to the real server.


In conclusion, it seems that we can have resilient software to bugs, no matter how serious, but that software will come at a performance penalty, and would require more conservative designs. While there are applications that mainly target performance and could not always benefit from the above ideas, there are several applications that can sacrifice few extra milliseconds of processing for a more resilient to attacks design.

Sunday, November 17, 2013

Inside an SSL VPN protocol

Some time ago when trying to provide a way to interconnect the routers of the company I was working on, I attempted to evaluate the various secure VPN solutions available as free software. As I was already familiar with cryptography and secure communications protocols, I initially tried to review their design. To my surprise the most prominent secure VPN available at the time, had its source code as its documentation. My reverse engineering of the protocol showed that while SSL was claimed, it was only used as key exchange method. The actual protocol transferring packets was a custom one. This didn't align with my taste, but that was finally included in the routers as there were not many alternatives at the time.

Years after that, I was contacted by David Woodhouse, proposing changes to GnuTLS in order to support an early draft version of Datagram TLS (DTLS). Since we already had support for the final version of DTLS (i.e, 1.0), I couldn't understand the request. As it seems David was working on openconnect, a client for the CISCO AnyConnect SSL VPN protocol.

That intrigued me, as it was the first SSL VPN solution I had heard of that used Datagram TLS to transfer data. My interest increased as I learned more about openconnect, so much that I even ended-up writing the server counterpart of openconnect. Because the details of the protocol are still confined to David's head and mine, I'll attempt in this post to describe them on a higher level than source code.

Key exchange & Authentication 

The protocol is very simple in nature and is HTTP-based. Initially the client connects to the server over TLS (note that TLS runs only over TCP, something that I take as granted on the rest of this text). On that TLS session the server is authenticated using its certificate, and the client may optionally be authenticated using a certificate as well. After the TLS key exchange is complete, the client obtains an authentication page by issuing an HTTP "POST /" request containing the following.

<config-auth client="vpn" type="init">
    <version who="vpn">v5.01</version>
    <device-id>linux-64</device-id>
    <group-access>https://example.com</group-access>
</config-auth>
 
If the client did not present a certificate, or if additional information is required (e.g., an one-time password), the following takes place. The server replies with a request for the client's username that looks like:

<auth id="main">
    <message>Please enter your username</message>
    <form action="/auth" method="post">
        <input label="Username:" name="username" type="text" />
    </form>
</auth>


Which effectively contains the message to be printed to the user, as well the URL (/auth) where the string obtained by the user should be sent. The client subsequently replies with a POST containing his username.

<config-auth client="vpn" type="auth-reply">
    <version who="vpn">v5.01</version>
    <device-id>linux-64</device-id>
    <auth><username>test</username></auth>
</config-auth>

 
Once the username is received by the server, a similar conversation continues for the number of passwords that are required by the user. The message format remains essencially the same so we skip this part.

VPN tunnel establishment 

When authenticated, the client issues an HTTP CONNECT request. That effectively terminates the HTTP session and initiates a VPN tunnel over the existing TLS session. In short the client issues:

CONNECT /CSCOSSLC/tunnel HTTP/1.1
User-Agent: Open AnyConnect VPN Agent v5.01
X-CSTP-Version: 1
X-CSTP-MTU: 1280
X-CSTP-Address-Type: IPv6,IPv4
X-DTLS-Master-Secret: DAA8F66082E7661AE593 [truncated]
X-DTLS-CipherSuite: AES256-SHA:AES128-SHA:DES-CBC3-SHA


and the server replies with something that looks like the following.

HTTP/1.1 200 CONNECTED
X-CSTP-Version: 1
X-CSTP-DPD: 440
X-CSTP-Address: 192.168.1.191
X-CSTP-Netmask: 255.255.255.0
X-CSTP-DNS: 192.168.1.190
X-CSTP-Split-Include: 192.168.1.0/255.255.255.0
X-CSTP-Keepalive: 32400
X-CSTP-Rekey-Time: 115200
X-CSTP-Rekey-Method: new-tunnel
X-DTLS-Session-ID: 767a9ad8 [truncated]
X-DTLS-DPD: 440
X-DTLS-Port: 443
X-DTLS-Rekey-Time: 115200
X-DTLS-Keepalive: 32400
X-DTLS-CipherSuite: AES128-SHA
X-DTLS-MTU: 1214
X-CSTP-MTU: 1214

This completes the HTTP authentication phase of the protocol. At this point a VPN tunnel is established over TLS, and the client obtains the IP address present in the "X-CSTP-Address" header, and adds the "X-CSTP-Split-Include" routes to its routing table. The IP packets read from the local TUN device are sent via the tunnel to the peer with an 8-byte prefix, that allows distinguishing IP data from various VPN packets such as keep-alive or dead-peer-detection.

It is, however, well known that TCP over TCP is far from being optimal, and for this reason the server provides the option to the client for an additional tunnel over UDP and DTLS.


VPN tunnel over UDP

To initiate the Datagram TLS over UDP session the client sends the "X-DTLS-Master-Secret" and "X-DTLS-CipherSuite" headers at its CONNECT request (see above). The former contains the key to be used as the pre-master key, in TLS terminology, and the latter contains a list of ciphersuites as read by OpenSSL. The server replies on these requests by accepting a ciphersuite and presenting it in its "X-DTLS-CipherSuite" header, adding the headers such as "X-DTLS-Port" and "X-DTLS-Session-ID" and others less relevant for this description.

At the receipt of that information the client initiates a Datagram TLS session (using a draft version of DTLS 1.0) on the port indicated by the server. A good question at this point is how is the server associating the new DTLS session request with this particular client. The hint here is that the client copies the value in the "X-DTLS-Session-ID" header to its Session ID field of the DTLS Client Hello. That session is in effect handled as a session resumption that uses the "X-DTLS-Master-Secret" as pre-master secret, the previous session ID and the ciphersuite negotiated in the "X-DTLS-CipherSuite" header (presumably that hackish approach exists because this solution pre-dates TLS with preshared keys).

That completes the DTLS negotiation over UDP, and the establishment of the second VPN tunnel. From this point this tunnel is being used as primary, and if for some reason it goes down the traffic is redirected to the backup tunnel over TLS. A difference of the DTLS tunnel with the TLS one, is that the VPN packet header is a single byte instead of 8. The smaller header is an important save since DTLS packets are restricted by the link MTU which is further reduced by the headers of IP, UDP, and DTLS.

Rekeying 

DTLS allows the transfer up to 248 packets (or 220 petabytes - assuming an average of 800 bytes per packet) in a single session. To allow for even larger transfers and to refresh the keys used the protocol enforces re-keying by time as indicated by the "X-DTLS-Rekey-Time". At the moment openconnect implements that by tearing up both the TLS and DTLS tunnels and reconnecting to the server.

Reconnections

Reconnections in this protocol, e.g., because of the client switching networks and changing IP, are handled on the HTTP level. That is the client accepts a cookie by the server and uses it on any subsequent connection.

Data transfer


In the VPN tunnel establishment we show the negotiation of the ciphersuite used for the DTLS tunnel and actual data transfer. Due to the protocol's restriction to pre-DTLS 1.0 the available ciphersuite options are the following three:
  • AES256-CBC-SHA
  • AES128-CBC-SHA
  • 3DES-CBC-SHA
3DES-CBC is a performance nightmare, so for any practical purposes AES128-CBC and AES256-CBC are being used. However, all of these ciphersuites are vulnerable to the padding oracle attacks, and in have considerable header overhead (they require a full-block random IV per packet, padding and have a quite long MAC of 20 bytes). These inefficiencies were the main incentive for the salsa20 proposal in TLS.

Overview

Overall the protocol looks hackish for someone reading it today on a high level. It is however, the closest to standard's based VPN protocol that is around. It has some weaknesses (vulnerable to padding oracle attacks) and limitations as well (for example it cannot easily use any other version than the pre-draft DTLS), and they can be easily be fixed, but let's leave that for a future post.

Thursday, May 16, 2013

Salsa20 and UMAC in TLS

Lately while I was implementing and deploying an SSL VPN server, I realized  that even for a peer-to-peer connections the resources taken for encryption on the two ARM systems I used were quite excessive. These ARM processors do not have instructions to speed-up AES and SHA1, and were spending most of their resources to encrypt and authenticate the exchanged packets.

What can be done in such a case? The SSL VPN server utilized DTLS which runs over UDP and restricts the packet size to the path MTU size (typically 1400 bytes if we want to avoid fragmentation and reassembly), thus wastes quite some resources on packetization of long data. Since the packet size cannot be modified we could possibly improve the encryption and authentication speed.  Unfortunately using a more lightweight cipher available in TLS, such as RC4, is not an option as it is not available in DTLS (while TLS and DTLS mostly share the same set of ciphersuites, some ciphers like RC4 due to constraints cannot be used in DTLS). Overall, we cannot do much with the currently defined algorithms in DTLS, we need to move outside the TLS protocol box.

Some time ago there was an EU-sponsored competition on stream ciphers (which are typically characterized by their performance) and Salsa20, one of the winners, was recently added in nettle (the library GnuTLS uses) by Simon Josefsson who conceived the idea of such a fast stream cipher being added to TLS. While modifying GnuTLS to take advantage of Salsa20, I also considered moving away from HMAC (the slow message authentication mechanism TLS uses) and use the UMAC construction which provides a security proof and impressive performance. My initial attempt to port the UMAC reference code (which was not ideal code), motivated the author of nettle, Niels Moeller, to reimplement UMAC in a cleaner way. As such Salsa20 with UMAC is now included in nettle and are used by GnuTLS 3.2.0. The results are quite impressive.

Salsa20 with UMAC96 ciphersuites were 2-3 times faster than any AES variant used in TLS, and outperformed even RC4-SHA1, the fastest ciphersuite defined in the TLS protocol. The results as seen on an Intel i3 are shown below (they are reproducible using gnutls-cli --benchmark-tls-ciphers). Note that SHA1 in the ciphersuite name means HMAC-SHA1 and Salsa20/12 is the variant of Salsa20 that was among the eStream competition winners.

Performance on 1400-byte packets
CiphersuiteMbyte/sec
SALSA20-UMAC96107.82
SALSA20-SHA168.97
SALSA20/12-UMAC96130.13
SALSA20/12-SHA177.01
AES-128-CBC-SHA144.70
AES-128-GCM44.33
RC4-SHA161.14

The results as seen on the openconnect VPN performance on two PCs, connected over a 100-Mbit ethernet, are as follows.

Performance of a VPN transfer over ethernet
CiphersuiteMbits/secCPU load (top)
None (plain transfer)948%
SALSA20/12-UMAC968957%
AES-128-CBC-SHA18676%

While the performance difference of SALSA20 and AES-128-CBC isn't impressive (AES was already not too low), the difference in the load of the server CPU is significant.

Would such ciphersuites be also useful to a wider set of applications than VPN? I believe the answer is positive, and not only for performance reasons. This year new attacks were devised on AES-128-CBC-SHA1 and RC4-SHA1 ciphersuites in TLS that cannot be easily worked around. For AES-128-CBC-SHA1 there are some hacks that reduce the impact of the known attacks, but they are hacks not a solution. As such TLS will benefit from a new set of ciphersuites that replace the old ones with known issues. Moreover, even if we consider RC4 as a viable solution today (which is not), the DTLS protocol cannot take advantage of it, and datagram applications such as VPNs need to rely on the much slower AES-128-GCM.

So we see several advantages in this new list of ciphersuites and for that, with Simon Josefsson and Joachim Strombergson we plan to propose to the IETF TLS Working Group the adoption of a set of Salsa20-based ciphersuites. We were asked by the WG chairs to present our work in the IETF 87 meeting in Berlin. For that I plan to travel to the meeting in Berlin to present our current Internet-Draft.

For that, if you support defining these ciphersuites in TLS, we need your help. If you are an IETF participant please join the TLS Working Group meeting and indicate your support. Also if you have any feedback on the approach or suggest another field of work that this could be useful please drop me a mail or leave a comment below mentioning your name and any association.

Moreover, as it is now, a lightning trip to Berlin on these dates would cost at minimum 800 euros including the IETF single day registration. As this is not part of our day job any contribution that would help to partially cover those expenses is welcome.

Tuesday, March 26, 2013

The perils of LGPLv3

LGPLv3 is the latest version of the GNU Lesser General Public License. It follows the successful LGPLv2.1 license, and was released by Free Software Foundation as a counterpart to its GNU General Public License version 3.

The goal of the GNU Lesser General Public Licenses is to provide software that can be used by both proprietary and free software. This goal has been successfully handled so far by LGPLv2.1, and there is a multitude of libraries using that license. Now we have LGPLv3 as the latest, and the question is how successful is LGPLv3 on this goal?

In my opinion, very little. If we assume that its primary goal is to be used by free software, then it blatantly fails that. LGPLv3 has serious issues when used with free software, and especially with the GNU GPL version 2. Projects under the GPLv2 license violate its terms if they use an LGPLv3 library (because LGPLv3 adds additional restrictions).

What does FSF suggest on that case? It suggests upgrading GPLv2 projects to GPLv3. That's a viable solution, if you actually can and want to upgrade to GPLv3. At the GnuTLS project, after we switched to LGPLv3, we realized that in practice we forced all of our GPLv2 (or later) users to distribute their binaries under GPLv3. Moreover, we also realized that several GPLv2-only projects (i.e., projects that did not have the GPLv2 or later clause in their license), could no longer use the library at all.

The same incompatibility issue exists with LGPLv2.1 projects that want to use an LGPLv3 library. They must be upgraded to LGPLv3.

In discussions within FSF, Stallman had suggested using the dual license LGPLv3 or GPLv2, in libraries to overcome these issues. That although it does not solve the issue with the LGPLv2.1 libraries, is not a bad suggestion, but it has a major drawback. The projects under the dual LGPLv3 or GPLv2 license, cannot re-use code from other LGPLv3 projects nor from GPLv2 projects, creating a rather awkward situation for the project.

So it seems we have a "lesser" kind of license for use by libraries, that mandates free software project authors the license they should release their projects with. I find that unacceptable for such a license. It seems to me that with this license, FSF just asks you not to use LGPLv3 for your libraries.

So ok, LGPLv3 has issues with free software licenses... but how does it work with proprietary projects? Surprisingly it works better than with free software licenses; it doesn't require them to change their license.

Nevertheless there is a catch. My understanding of LGPLv3 (it is one of the hardest licenses to read - as it requires you to read first the GPLv3 remove some of its clauses and then add some additional ones) is that it contains the anti-tivoization clause of GPLv3. That is, in a home appliance, if the creator of the appliance has an upgrade mechanism, he has to provide a way to upgrade the library. That doesn't really make sense to me, neither as a consumer, nor as someone who potentially creates software for appliances.

As a consumer, why would I consider the ability to only upgrade the (say) libz library on my router a major advantage? You may say that I have additional privileges (more freedom) on my router. It could be, but my concern is that this option will hardly ever happen.  Will an appliance creator chose a "lesser" GPL library when this clause is present? Will he spend the additional time to create a special upgrade mechanism for a single library to satisfy the LGPLv3 license, or will he chose a non-LGPLv3 license? (remember that the goal of the LGPL licenses according to FSF is to use them in software where proprietary, or free of charge alternatives exist)


So overall, it seems to me that LGPLv3 has so many practical issues that actually make its advantages (e.g. patent clauses) seem irrelevant, and I don't plan to use it as a license of my libraries. I'm back to the good old LGPLv2.1.