Monday, April 3, 2017

The mess with internationalized domain names

While internationalized domain names (DNS names) are not common in the English speaking world, they exist and their use was standardized by IETF's IDNA standards. I first found out the existence of that possibility while reading the IETF's best practices for domain name verification. As english is not my mother tongue I was particularly interested on the topic, and wanted to make sure that GnuTLS would handle such domains correctly both for storing such domains, and verifying them. That proved not to be an easy task. The following text summarizes my brief understanding of the issues in the field (disclaimer: I am far from an expert in software internationalization topics).

How does IDNA work?

To make a long story short, the IDNA protocols are based on a simple principle. They translate domain names typed with unicode characters (UTF-8 or otherwise), to a US-ASCII (English text) representation which becomes the actual domain name. For example the greek name "ένα.gr" would translate to "xn--ixai9a.gr". On Linux systems one can find Simon Josefsson's idn and idn2 tools (more on that below), which can be used to translate from an internationalized string to IDNA format. For example:

    $ echo "ενα.gr"|idn
    xn--mxahy.gr

 

What are the issues with IDNA?

Although there are simple to use libraries (see Libidn) to access IDNA functionality, there is a catch. In 2010, IETF updated the IDNA standards with a new set of standards called IDNA2008, which were "mostly compatible" with the original standard (called IDNA2003). Mostly compatible meant that the majority of strings mapped to the same US-ASCII equivalent, though some didn't. They mapped to a different string. That affected many languages, including the Greek language mappins, and the following table displays the IDNA2003 and IDNA2008 mappings of few "problematic" Greek domain names.

non-English string IDNA2003 IDNA2008
νίκος.gr xn--kxawhku.gr xn--kxawhkp.gr
νίκοσ.gr xn--kxawhku.gr xn--kxawhku.gr
NΊΚΟΣ.gr xn--kxawhku.gr (undefined)

In the above table, we can see the differences in mappings for three strings. All of the above strings can be considered to be equal in the greek language, as the third is the capitalized version of the first, and the second is the "dumb" lower case equivalent of the last.

The problematic character is 'σ' which in Modern Greek is switched with 'ς' when it is present at the end of word. As both characters are considered to be identical in the language, they are both capitalized to the same character 'Σ' (Sigma).

There are two changes in IDNA2008 standard which affect the examples above. The first, is the treatment of the 'ς' and 'σ' characters as different, causing the discrepancy between the mappings in the first and second examples. The second is that IDNA2008 is defined only for a specific set of characters, and there is no pre-processing phase, which causes the undefined state of the third string, that contains capital letters. These changes, create a discrepancy between expectations formed by observing the behavior of domains consisting of US-ASCII strings and the actual reality with Internationalized scripts. Similar cases exist in other languages (e.g., with the treatment of the 'ß' character in German).

Even though some work-arounds of the protocol may seem obvious or intuitive to implement, such as lower-casing characters prior to converting to IDNA format, lower-casing doesn't make sense in all languages. This is the reason that the capitalized version (NΊΚΟΣ.gr) of the first string on the table, is undefined in IDNA2008.

You can verify the mappings I presented above with the idn2 application, which is IDNA2008-compliant. For example:

    echo "NΊΚΟΣ.gr"|idn2
    idn2: lookup: string contains a disallowed character

 

Is there any solution?

To address these issues, a different standards body --the Unicode consortium-- addressed the issue with the Unicode Technical Standard #46 (UTS#46 or TR#46). That standard was published in 2016 to clarify few aspects of IDNA2008 and propose a compatible with IDNA2003 behavior.

UTS#46 proposes two modes of IDNA2008, the transitional, which results to problematic characters being mapped to their IDNA2003 equivalents and the non-transitional mode, which is identical to the original IDNA2008 standard. In addition it requires the internationalized input to be pre-processed with the CaseFold algorithm which allows handling upper-case domain names such as "ΝΊΚΟΣ.gr" under IDNA2008.

 

Switching to IDNA2008

Unfortunately even with UTS#46, we are left with two IDNA2008 variants. The transitional which is IDNA2003 compatible and the non-transitional which is IDNA2008 incompatible. Some NICs and registrars have already switched to IDNA2008 non-transitional, but not all software has followed up.

A problem is, that UTS#46 does not define a period for the use of transitional encodings, something that makes their intended use questionable. Nevertheless, as the end-goal is to switch to the non-transitional IDNA2008, it still makes it practical to switch to it, by clarifying several undefined parts of the original protocol (e.g., adds a pre-processing phase). As a result, few browsers (e.g., Firefox) have already switched to it. It is also possible for software based on libidn, which only supports IDNA2003, to switch. The libidn2 2.0.0 release includes a libidn compatible APIs making it possible to switch to IDNA2008 (transitional or not).

 

Should I do the switch?

There are few important aspects of the IDNA2008 (non-transitional) domain names, which have to be taken into account prior to switching. As we saw above, the semantics of entering a domain in upper case, and expecting it to be translated to the proper web-site address wouldn't work for internationalized domain names. If one enters the domain "ΝΊΚΟΣ.gr", it would translate to the domain xn--kxawhku.gr (i.e., "νίκοσ.gr"), which is a misspelled version of the correct in Greek language "νίκος.gr".

Moreover, as few software has switched to IDNA2008 non-transitional processing of domain names, there is always the discrepancy between the IDNA2003 mapping and the IDNA2008 mapping. That is, a domain owner would have to be prepared to register both the IDNA2003 version of the name and the IDNA2008 version of it, to ensure all users are properly redirected to his intended site. This is apparent on the following real domains.
  • http://fass.de
  • http://faß.de
If you are a German speaker you most likely consider them equivalent, as the 'ß' character is often expanded to 'ss'. That is how IDNA2003 treated that character, however, that's not how IDNA2008 treats it. If you use the Chrome browser which at the moment uses IDNA2003 (or more precisely IDNA2008 transitional), both of these URIs you will be re-directed to the same web-site, fass.de. However, if you use Firefox, which uses IDNA2008, you will be re-directed to two different web sites. The first being the fass.de and the second the xn--fa-hia.de.

That discrepancy was treated as a security issue by the curl and wget projects and was assigned CVE-2016-8625. Both projects switched to non-transitional IDNA2008.

 

What about certificates, can they address the issue above?

Unfortunately the above situation, cannot be fixed with X.509 certificates and in fact such a situation undermines the trust in them. The operation of X.509 certificates for web site authentication, is based on the uniqueness of domain names. In english language we can be sure that a domain name, whether entered in upper or lower case will be mapped to unique web-site. With internationalized names that's no longer the case.

What is unique in internationalized names is the final output domain, e.g., xn--kxawhku.gr, which for authentication purposes is meaningless as it is, so we have to rely on software to do the reverse mapping for us, on the right place. If the software we use uses different mapping rules than the rules applied by the registrar of the domain, users are left helpless as in the case above.

 

What to do now?

Although at this point, we know that IDNA2008 has quite some peculiarities which will be problematic in the future, we have no better option available. IDNA2003 cannot support new unicode standards and is already obsolete, so biting the bullet, and moving to non-transitional IDNA2008 seems like the right way to go. It is better to have a single and a little problematic standard, rather than have two active standards for domain name mapping.

Tuesday, March 21, 2017

Improving by simplifying the GnuTLS PRNG

One of the most unwanted baggages for crypto implementations written prior to this decade is the (pseudo-)random generator, or simply PRNG. Speaking for GnuTLS, the random generator was written at a time where devices like /dev/urandom did not come by default on widely used operating systems, and even if they did, they were not universally available, e.g., devices would not be present, the Entropy Gathering Daemon (EGD) was something that was actually used in practice, and was common for software libraries like libgcrypt to include code to gather entropy on a system by running arbitrary command line tools.

That resulted in an internal random generator which had to rely on whatever was provided by the operating system and the administrator, and that, in several cases was insufficient to seed a cryptographic PRNG. As such, an advanced PRNG was selected, based on Yarrow, which kept a global per-process state, and was aggressively gathering information, including high precision timestamps and process/thread statistics, to enhance a potentially untrusted pool formed from the system random generator or EGD. That, also meant locks for multi-threaded processes to access the global state, and thus a performance bottleneck, since a call to the PRNG is required even for the simplest of crypto operations.

Today, however, things have changed in operating systems. While Linux used to be a pioneer with /dev/urandom, now all operating systems provide a reliable PRNG, even though there are still no standardized APIs.
  • Linux provides /dev/urandom, getrandom(), getentropy()
  • Windows provides CryptGenRandom()
  • *BSD provides /dev/urandom, getentropy()
  • MacOSX provides /dev/urandom, getentropy()
  • Solaris: /dev/urandom, getentropy(), getrandom().
On the list above, I ignore the /dev/random interface which has concerning properties, such as indefinite response time (see my previous post for limitations on the Linux interfaces).

Some of the interfaces above are provided as system calls, some others as libc calls, and others as file system devices, but for the application writer, that shouldn't make significant difference. These devices or system calls, provide access to a system PRNG, which is in short doing what was GnuTLS doing manually previously, mixing various inputs from the system, in a level and way that a userspace library like GnuTLS could never do, as the kernel has direct access to available hardware and interrupts.

Given the above, a question that I've been asking myself lately, is whether there is any reason to continue shipping something advanced such as a Yarrow-based PRNG in GnuTLS? Why not switch to simple PRNG, seeded only by the system device? That would not only provide simplicity in the implementation, but also reduce the performance and memory cost of complex constructions like Yarrow. In turn, switching to something simple with low memory requirements would allow having a separate PRNG per-thread, further eliminating the bottleneck of a global per-process PRNG.

The current PRNG


To provide some context on GnuTLS' PRNG, it is made available through the following function all:
 int gnutls_rnd(gnutls_rnd_level_t level, void *data, size_t len);
That takes as input an indicative level, which can be NONCE for generating nonces, RANDOM for session keys, or KEY for long term keys. The function outputs random data in the provided buffer.

There was (a partial) attempt in GnuTLS 3.3.0 to improve performance, by introducing a Salsa20-based PRNG for generating nonces, while keeping Yarrow for generating keys. This change, although it provided the expected performance improvement for the generation of nonces, it still kept global state, and thus still imposed a bottleneck for multi-threaded processes. At the same time, it offered no improvement on the memory consumption (in fact it was increased slightly by a Salsa20 instance - around 64 bytes).

For the yet-unreleased 3.6.0, we took that enhancement several steps further, ensuing the elimination of the locking bottleneck for multi-threaded processes. It was a result of a relatively large patch set, improving the state of the internal PRNG, and rewriting it, to the following layout.

The new PRNG


The Yarrow and Salsa20 PRNGs were replaced by two independent PRNGs based on the CHACHA stream cipher. One PRNG is intended to be used for the NONCE level (which we'll refer to it as the nonce PRNG) and the other for KEY and RANDOM levels (the key PRNG). That reduces the memory requirements by eliminating the heavyweight Yarrow, and at the same time allows better use of the CPU caches, by employing a cipher that is potentially utilized by the TLS protocol, due to the CHACHA-POLY1305 ciphersuite.

To make the state lock-free, these two generators keep their state per thread by taking advantage of thread local data. That imposes a small memory penalty per-thread --two instances of CHACHA occupy roughly 128-bytes--, albeit, it eliminates the bottleneck of locks to access the random generator in a process.

Seeding the PRNG

The PRNGs used by GnuTLS are created and seeded on the first call to gnutls_rnd(). This behavior is a side-effect of a fix for getrandom() blocking in early boot in Linux, but it fits well with the new PRNG design. Only threads which utilize the PRNG calls will allocate memory for it, and carry out any seeding.

For threads that utilize the generator, the initial seeding involves calling the system PRNG, i.e., getrandom() in Linux kernel, to initialize the CHACHA instances. The PRNG is later re-seeded; the time of the re-seed depends both on time elapsed and the amount of bytes generated. At the moment of writing, the nonce PRNG will be re-seeded when 16MB of is generated, or 4 hours of operation, whichever is first. The key PRNG will re-seed using the operating system's PRNG, after 2MB of data are generated, or after 2 hours of operation.

As a side note, that re-seed based on time was initially a major concern of mine, as it was crucial for a call to random generator to be efficient, without utilizing system calls, i.e., imposing a switch to kernel mode. However, in glibc calls like time() and gettimeofday() are implemented with vdso something that transforms a system call like time(), to a memory access, hence do not introduce any significant performance penalty.

The data limits imposed to PRNG outputs are not entirely arbitrary. They allow several thousands of TLS sessions, prior to re-seeding, to avoid re-introducing a bottleneck on busy servers, this time being the system calls to operating system's PRNG.

Defense against common PRNG attacks

There are multiple attacks against a PRNG, which typically require a powerful adversary with access to the process state (i.e., memory). There are also attacks on which the adversary controls part of the input/seed to PRNG, but we axiomatically assume a trusted Operating System, trusted not only in the sense of not being backdoored, but also in the sense of doing its PRNG job well.

I'll not go through all the details of attacks (see here for a more detailed description), but the most prominent of these attacks and applicable to our PRNG are state-compromise attacks. That is, the attacker obtains somehow the state of the PRNG --think of a heartbleed-type of attack which results to the PRNG state being exposed--, and uses that exposed state to figure out past, and predict future outputs.
Given the amount of damage a heartbleed-type of attack can do, protecting against the PRNG state compromise attacks remind this pertinent XKCD strip. Nevertheless, there is merit to protecting against these attacks, as it is no longer unimaginable to have scenarios where the memory of the PRNG is exposed.

 

Preventing backtracking

This attack assumes that the attacker obtained access to the PRNG state at a given time, and would need to recover a number of bytes generated in the past. In this construct, both the  nonce and key PRNGs re-seed based on time, and data, after which recovery is not possible. As such an attacker is constrained to access data within the time or data window of the applicable generator.

Furthermore, generation of long-term keys (that is, the generator under the KEY level), ensures that such backtracking is impossible. That is, in addition to any re-seed previously described, the key generator will re-key itself with a fresh key generated from its own stream after each operation.

Preventing permanent compromise

That, is in a way the opposite of the previous attack. The attacker, still obtains access to the PRNG state at a given time, and would like to recover to recover all data generated in the future. In a design like this, we would like to limit the number of future bytes that can be recovered.

Again, the time and data windows of the PRNGs restrict the adversary's access within them. An attacker will have to obtain constant or periodic access to the PRNG state, to be able to efficiently attack the system.


Final remarks

The design of the new GnuTLS PRNG is quite similar to the arc4random implementation on the OpenBSD system. The latter despite its name, is also based on the CHACHA cipher. Few details differ, however. The GnuTLS PRNG enforces a refresh of the PRNG based on elapsed time, in addition to output data, does re-key only for when a requests for data at the KEY level, and strives for low memory footprint as it utilizes a separate generator per process thread.

 

Another thing to note, is that the fact that the gnutls_rnd() call allows for an advisory level to be specified, provides the internal implementation quite some flexibility. That is, the given level, although advisory, allows for optimizations to be enabled for levels that are not intended for secrecy. That is, apply different data and time limits on nonce and key generator, and thus increasing performance when possible. The cost of such a compromise for performance, is a larger window of exposure when the PRNG's state is compromised.


The generator described, will be made available in the next major release of GnuTLS, although the details may change.


Sunday, November 13, 2016

Using the Nitrokey HSM with GnuTLS applications

The Nitrokey HSM is an open hardware security module, in the form of a smart card token, which is used to isolate a server's private key from the application. That is, if you have an HTTPS server, such a hardware security module will prevent an attacker which temporarily obtained privileged access on the server (e.g., via an exploit like heartbleed), from copying the server's private key, allowing for impersonating it. See my previous post for a more elaborate discussion on that defense mechanism.

The rest of this post will explain how you can initialize this token and utilize it from GnuTLS applications, and in the process explain more about smart card and HSM usage in applications. For the official (and more advanced) Nitrokey setup instructions and tips you can see this OpenSC page, another interesting guide is here.

HSMs and smart cards

 

Nitrokey HSM is something between a smart card and an HSM. However, there is no real distinction between smart cards and Hardware Security Module from a software perspective. Hardware-wise one expects better (in terms of cost to defeat) tamper-resistance on HSMs, and at the same time sufficient performance for server loads. An HSM module is typically installed on PCI slots, USB, while smart cards are mainly USB or via a card reader.

On the software-side both smart cards and HSMs are accessed the same way, over the PKCS#11 API. That is an API which abstracts keys from operations, i.e., the API doesn't require direct access to the private key data to complete the operation. Most crypto libraries today support this API directly as GnuTLS and NSS do, or via an external module like OpenSSL (i.e., via engine_pkcs11).

Each HSM or smart card, comes with a "driver", i.e., a PKCS#11 module, which one had to specify on legacy applications. On modern systems, which have p11-kit, the available drivers are registered with p11-kit and applications can obtain and utilize them on run-time (see below for more information). For Nitrokey the OpenSC driver is being used, a driver for almost every other smart card that is supported on Linux.

If you are familiar with old applications, you would have noticed that objects were referred to as "slot1_1", which meant the first object on the first slot of the driver, or "1:1", and several other obscure methods depending on the application. The "slots" notion is an internal to PKCS#11, which is inherently unstable (re-inserting may change the slot number assignment), thus these methods to refer to objects cannot accommodate easily for multiple cards, or for referring to an object within a specific card if multiple are present, nor to easily utilize cards which are under the different drivers. More recent applications support PKCS#11 URIs, a method to identify tokens, and objects within the token which is unique system-wide; the URI looks like:

pkcs11:token=SmartCard-HSM;object=my-ecc-key;type=private
For GnuTLS applications, only PKCS#11 URIs can be used to refer to objects.


Driver setup and token discovery

 

On a typical Linux system which runs the pcscd server, and has opensc and p11-kit properly installed the following command should list the nitrokey token once inserted.
    $ p11tool --list-tokens

One of the entries printed should be something like the following.
Token 5:
    URL: pkcs11:model=PKCS%2315%20emulated;manufacturer=www.CardContact.de;serial=DENK0100424;token=SmartCard-HSM20%28UserPIN%29
    Type: Hardware token
    Manufacturer: www.CardContact.de
    Model: PKCS#15 emulated
    Serial: DENK0100424
    Module: /usr/lib64/pkcs11/pkcs11/opensc-pkcs11.so

The above information contains the identifying PKCS#11 URI of the token as well as information about the manufacturer and the driver library used. The PKCS#11 URI is a standardized unique identifier of tokens and objects stored within a token. If you do not see that information, verify that you have all of pcsc-lite, pcsc-lite-ccid, opensc, gnutls and p11-kit installed. If that's the case, you will need to register the opensc token to make it known to p11-kit manually (modern distributions take care of this step). This can be done with the following commands as administrator.
    # mkdir -p /etc/pkcs11/modules
    # echo "module: /usr/lib64/pkcs11/opensc-pkcs11.so" >/etc/pkcs11/modules/opensc.conf

It is implied that the your system's libdir for PKCS#11 drivers  should be used instead of the "/usr/lib64/pkcs11" path used above. Alternatively, one could append the --provider parameter on the p11tool command, to explicitly specify the driver, as in the following example. For the rest of this text we assume a properly configured p11-kit and omit the --provider parameter.
    $ p11tool --provider /usr/lib64/pkcs11/opensc-pkcs11.so --list-tokens

Token initialization

 

An HSM token prior to usage needs to be initialized, and be provided two PINs. One PIN is for operations requiring administrative (security officer in PKCS#11 jargon) access, and the second (the user PIN ) is for normal token usage. To initialize use the following command, with the PKCS#11 URL listed by the 'p11tool --list-tokens' command; in the following text we will use $URL to refer to that.
    $ p11tool --initialize "$URL"

Alternatively, when the driver supplied supports a single card, the URL can be specified as "pkcs11:" as shown below.
    $ p11tool --provider  /usr/lib64/pkcs11/opensc-pkcs11.so --initialize "pkcs11:"

The initialization commands above will ask to setup the security officer's PIN, which for nitrokey HSM is by default "3537363231383830". At the initialization process, the user PIN will also be asked. The user PIN is PIN which must be provided by applications and users, in order to use the card. Note that the command above (prior to GnuTLS 3.5.6) will ask for the administrator's PIN twice, once for initialization and once for setting the user PIN.

Key and certificate generation


It is possible to either copy an existing key on the card, or generate a key in it, a key which cannot be extracted. To generate an elliptic curve (ECDSA) key use the following command.
    $ p11tool --label "my-key" --login --generate-ecc "pkcs11:token=SmartCard-HSM20%28UserPIN%29"

The above command will generate an ECDSA key which will be identified by the name set by the label. That key can be then by fully identified by the PKCS#11 URL "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private". If the command was successful, the command above will list two objects, the private key and the public key.
    $ p11tool --login --list-all "pkcs11:token=SmartCard-HSM20%28UserPIN%29"

Note that both objects share the same ID but have different type. As this key cannot be extracted from the token, we need to utilize the following commands to generate a Certificate Signing Request (CSR).

    $ certtool --generate-request --load-privkey "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private" --outfile cert.csr
After providing the required information to certtool, it will generate a certificate request on cert.csr file. Alternatively, to generate a self-signed certificate, one can replace the '--generate-request' parameter with the '--generate-self-signed'.

The above generated certificate signining request, will allow to get a real certificate to use for the key stored in the token. That can be generated either with letsencrypt or a local PKI. As the details vary, I'm skipping this step, and I'm assuming a certificate is generated somehow.

After the certificate is made available, one can write it in the token. That step is not strictly required, but in several scenarios it simplifies key/cert management by storing them at the same token. One can store the certificate, using the following command.
    $ p11tool --login --write --load-certificate cert.pem --label my-cert --id "PUBKEY-ID" "pkcs11:token=SmartCard-HSM20%28UserPIN%29"
Note that specifying the PUBKEY-ID is not required, but it is generally recommended for certificate objects to match the ID of the public key object listed previously with the --list-all command. If the IDs do not match some (non-GnuTLS) applications may fail to utilize the key. The certificate stored in the token will have the PKCS#11 URL "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert".


Testing the generated keys


Now that both the key and the certificate are present in the token, one can utilize their PKCS#11 URL in any GnuTLS application in place of filenames. That is if the application is asking for a certificate file, enter "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert", and for private key "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private".

The following example will run a test HTTPS server using the keys above.

    $ gnutls-serv --port 4443 --http --x509certfile "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert" --x509keyfile "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private;pin-value=1234"
That will setup a server which answers on port 4443 and will utilize the certificate and key on the token to perform TLS authentication. Note that the command above, demonstrates the use of the "pin-value" URI element. That element, specifies the object PIN on command line allowing for non-interactive token access.

Applicability and performance


While the performance of this HSM will most likely not allow you to utilize it in busy servers, it may be a sufficient solution for a private server, VPN, a testing environment or demo. On client side, it can certainly provide a sufficient solution to protect the client assigned private keys. The advantage a smart card provides to OTP, is the fact that it is simpler to provision remotely, with the certificate request method shown above. That can be automated, at least in theory, when a protocol implementation of SCEP is around. In practice, SCEP is well established in the proprietary world, but it is hard to find free software applications taking advantage of it.

Converting your application to use PKCS#11


A typical application written to use GnuTLS as TLS back-end library should be able to use smart cards and HSM tokens out of the box. The only requirement is for the applications to use the high-level file loading functions, which can load files or PKCS#11 URIs when provided. The only new requirement is for the application to obtain the PIN required for accessing the token, that can be done interactively using the PIN callbacks, or via the PKCS#11 URI "pin-value" element. For source examples, I'll refer you to GnuTLS documentation.
Some indicative applications which I'm aware they can use tokens via PKCS#11 URIs transparently, and can be used for testing, are mod_gnutls, lighttpd2, and openconnect.



Tuesday, October 25, 2016

A brief look at the Linux-kernel random generator interfaces

Most modern operating systems provide a cryptographic pseudo-random number generator (CPRNG), as part of their OS kernel, intended to be used by applications involving cryptographic operations. Linux is no exception in that, and in fact it was the first operating system that actually introduced a CPRNG into the kernel. However, there is much mystery around these interfaces. The manual page is quite unclear on its suggestions, while there is a web-site dedicated to debunking myths about these interfaces, which on a first read contradicts the manual page.

In this post, triggered by my recent attempt to understand the situation and update the Linux manual page, I'll make a brief overview of these interfaces. Note that, this post will not get into the insights of a cryptographic pseudo-random generator (CPRNG); for that, consider reading this article. I will go through these interfaces, intentionally staying on the high-level, without considering internal details, and discuss their usefulness for an application or library that requires access to such a CPRNG.

  • /dev/random: a file which if read from, will output data from the kernel CPRNG. Reading from this file blocks once the kernel (using some a little arbitrary metric) believes not enough random events have been accumulated since the last use (I know that this is not entirely accurate, but the description is sufficient for this post).
  • /dev/urandom: a file which if read from, will provide data from the kernel CPRNG. Reading from /dev/urandom will never block.
  • getrandom(): A system call which provides random data from the kernel CPRNG. It will block only when the CPRNG is not yet initialized.

A software engineer who would like to seed a PRNG or generate random encryption keys, and reads the manual page random(4) carefully, he will most likely be tempted to use /dev/random, as it is described as "suitable for uses that need very high quality randomness such as ... key generation". In practice /dev/random cannot be relied on, because it requires large amounts of random events to be accumulated in order to provide few bytes of random data to running processes. Using it for key generation (e.g, for ssh keys during first boot) is most likely going to convert the first boot process to a coin flip; heads and system is up, tails and the system is left hanging waiting for random events. This (old) issue with a mail service process hanging for more than 20 minutes prior to doing any action, illustrates the impact of this device to real-world applications which need to generate fresh keys on startup.

On the other hand, the device /dev/urandom provides access to the same random generator, but will never block, nor apply any restrictions to the amount of new random events that must be read in order to provide any output. That is quite natural given that modern random generators when initially seeded can provide enormous amounts of output prior to being considered broken (in an informational-theory sense). So should we use only /dev/urandom today?

There is a catch. Unfortunately /dev/urandom has a quite serious flaw. If used early on the boot process when the random number generator of the kernel is not fully initialized, it will still output data. How random are the output data is system-specific, and in modern platforms, which provide specialized CPU instructions to provide random data, that is less of an issue. However, the situation where ssh keys are generated prior to the kernel pool being initialized, can be observed in virtual machines which have not been given access to the host's random generator.

Another, though not as significant, issue is the fact that both of these interfaces require a file descriptor to operate. That, on a first view, may not seem like a flaw. In that case consider the following scenarios:
  • The application calls chroot() prior to initializing the crypto library; the chroot environment doesn't contain any of /dev/*random.
  • To avoid the issue above, the crypto library opens /dev/urandom on an library constructor and stores the descriptor for later use. The application closes all open file descriptors on startup.
Both are real-world scenarios observed over the years of developing the GnuTLS library. The latter scenario is of particular concern since, if the application opens few files, the crypto library may never realize that the /dev/urandom file descriptor has been closed and replaced by another file. That may result to reading from an arbitrary file to obtain randomness. Even though one can introduce checks to detect such case, that is a particularly hard issue to spot, and requires inefficient and complex code to address.

That's where the system call getrandom() fits. Its operation is very similar to /dev/urandom, that is, it provides non-blocking access to kernel CPRNG. In addition, it requires no file descriptor, and will also block prior to the kernel random generator being initialized. Given that it addresses, the issues of /dev/urandom identified above, that seems indeed like the interface that should be used by modern libraries and applications. In fact, if you use new versions of libgcrypt and GnuTLS today, they take advantage of this API (though that change wasn't exactly a walk in the park).

On the other hand, getrandom() is still a low-level interface, and may not be suitable to be used directly by applications expecting a safe high-level interface. If one carefully reads its manual page, he will notice that the API may return less data than the requested (if interrupted by signal), and today this system call is not even wrapped by glibc. That means that can be used only via the syscall() interface. An illustration of (safe) usage of this system call, is given below.

#include <sys/syscall.h>
#include <errno.h>
#define getrandom(dst,s,flags) syscall(SYS_getrandom, (void*)dst, (size_t)s, (unsigned int)flags)

static int safe_getrandom(void *buf, size_t buflen, unsigned int flags)
{
  ssize_t left = buflen;
  ssize_t ret;
  uint8_t *p = buf;
  while (left > 0) {
   ret = getrandom(p, left, flags);
   if (ret == -1) {
    if (errno != EINTR)
     return ret;
   }
   if (ret > 0) {
    left -= ret;
    p += ret;
   }
  }
  return buflen;
}

The previous example code assumes that the Linux kernel supports this system call. For portable code which may run on kernels without it, a fallback to /dev/urandom should also be included.

From the above, it is apparent that using the Linux-kernel provided interfaces to access the kernel CPRNG, is not easy. The old (/dev/*random) interfaces APIs are difficult to use correctly, and while the getrandom() call eliminates several of their issues, it is not straightforward to use, and is not available in Linux kernels prior to 3.17. Hence, if applications require access to a CPRNG, my recommendation would be to avoid using the kernel interfaces directly, and use any APIs provided by their crypto library of choice. That way the complexity of system-discovery and any other peculiarities of these interfaces will be hidden. Some hints and tips are shown in the Fedora defensive coding guide (which may be a bit out-of-date but still a good source of information).

Thursday, June 2, 2016

Restricting the scope of CA certificates

The granting of an intermediate CA certificate to a surveillance firm generated quite some fuss. Setting theories aside, the main reason behind that outcry, is the fact that any intermediate CA certificate trusted by the browsers has unlimited powers to certify any web site on the Internet. Servers can protect themselves against an arbitrary CA generating a valid certificate for their web site, using certificate pinning, but there is very little end-users can do. In practice, end-users either trust the whole bundled CA list in their browser/system or not.

An option for end-users is to utilize trust on first use, but that is not a widespread practice, and few software, besides for SSH, support it. A way for me as a user to defend against a believed to be rogue CA, is by disabling or removing that CA from my trusted bundle. But what if I trust that CA for a particular web site or domain, but not for the whole Internet?

On this post I'll try to provide more information on some lesser documented aspects of p11-kit, which provide additional control over the CA certificate bundle in a system. That is, I'll explain how we can do better than disabling CAs, and how we can restrict CAs to particular domains. The following instructions are limited to Fedora 22+ which has deployed a shared trust database for certificates based on p11-kit. This database, is not only an archive of trusted certificates, but also provides the option to attach additional attributes to CA certificates in the form of PKIX extensions. These extensions are called stapled extensions in p11-kit jargon and they override any extensions available in the trust certificates. That, allows to enforce additional restrictions to the purpose and scope of a certificate.

I'll attempt to demonstrate this feature using an example. Let's consider the case where your employer's IT department provided you with a CA certificate to trust for communications within the company. Let's also assume that the company's internal domain is called "example.com". In that scenario as a user I'd like to restrict the provided CA certificate to example.com domain to prevent anyone with access to the corporate private key from being able to hijack any connection outside the company scope. This is not only out of paranoia against a potential corporate big-brother but also to keep a good security practice and avoid having master keys. A stolen corporate CA key which is trusted for everything under the sun provides a potential attacker not only with access to company's internal communication, but also with access to Internet communication of any corporate user.

How would we install such certificate in a way that it is restricted only to example.com? Assuming that the CA certificate is provided at the example.com-root.pem file, the following command will add the company's certificate to the trusted list.
$ sudo trust anchor example.com-root.pem

That will create a file in /etc/pki/ca-trust/source containing the CA certificate (for more information on adding and removing CA certificates in Fedora see the update-ca-trust manpage).

If we edit this file we will see something like the following.
[p11-kit-object-v1]
trusted: true
x-distrusted: false
private: false
certificate-category: authority
-----BEGIN CERTIFICATE-----
MIIDsDCCAxmgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBnTELMAkGA1UEBhMCVVMx
...
-----END CERTIFICATE-----

This contains the certificate of the CA as well as various basic flags set to it.
How can we now attach a stapled extension to it?

We need to add another object in that database containing the extension. But let's see the process step by step. First we need to extract the certificate's public key because that's how p11-kit identifies existing objects. A command to achieve that is the following:
$ certool --pubkey-info --infile example.com-root.pem --outfile example.com-pubkey.pem

 The output file will contain a public key in PEM format (identifiable by the "-----BEGIN PUBLIC KEY-----" header). We now edit the p11-kit file in  /etc/pki/ca-trust/source containing our certificate and append the following.
[p11-kit-object-v1]
class: x-certificate-extension
label: "Example.com CA restriction"
object-id: 2.5.29.30
value: "%30%1a%06%03%55%1d%1e%04%13%30%11%a0%0f%30%0d%82%0b%65%78%61%6d%70%6c%65%2e%63%6f%6d"
-----BEGIN PUBLIC KEY-----
...
-----END PUBLIC KEY-----

Where the public key part is copied from the example.com-pubkey.pem file.

This added object, is a stapled extension containing a PKIX name constraints extension which allows this CA to be used for certificates under the "example.com" domain. If you attempt to connect to a host with a certificate of this CA you will get the following error:
$ gnutls-cli www.no-example.com
...
Status: The certificate is NOT trusted. The certificate chain violates the signer's constraints.
*** PKI verification of server certificate failed...

Note that, although NSS and openssl applications check some extensions (such as key purpose) from this trust database, they do not consider the name constraints extension. This may change in the future, but currently only GnuTLS applications under Fedora will honor this extension. The reason it works under Fedora distribution is because GnuTLS is compiled using the --with-default-trust-store-pkcs11="pkcs11:" configuration option which makes it use the p11-kit trust DB directly.

A question at this point, after seeing the p11-kit object format, is how can we generate the "value" listed above containing the desired constraints? The value contains a DER encoded certificate extension which corresponds to the object identifier "object-id" field. In this case the object-id field contains the object identifier for NameConstraints extension (2.5.29.30).

Unfortunately there are no available tools to generate this value, that I'm aware of. I created a sample application which will generate a valid name constraints value to be set above. The tool can be found at this github repository.

After you compile, run:
$ ./nconstraints mydomain.com myotherdomain.com
%30%30%06%03%55%1d%1e%04%29%30%27%a0%25%30%0e%82%0c%6d%79%64%6f%6d%61%69%6e%2e%63%6f%6d%30%13%82%11%6d%79%6f%74%68%65%72%64%6f%6d%61%69%6e%2e%63%6f%6d

and as you see, this command will provide the required string.

Happy hacking!

Monday, May 9, 2016

An overview of the new features in GnuTLS 3.5.0

Few minutes ago I've released GnuTLS 3.5.0. This is the stable-next branch of GnuTLS which will replace the stable GnuTLS 3.4.x branch within a year. It is fully backwards compatible and comes with several new features, the most prominent I'll summarize later on this post.

However, before going on the features let me describe the current trends in the applied cryptography field, to provide an idea of the issues considered during development, and to give context for the included changes. For the really impatient to see the features, jump to last section.

Non-NIST algorithms

After the dual EC-DRBG NIST fiasco, requests to avoid relying blindly on NIST approved/standardized algorithms for the Internet infrastructure, became louder and louder (for the new in the crypto field NIST is USA's National Institute of Standards and Technology). Even though NIST with its standardizations has certainly aided the Internet security technologies, the Snowden revelations that NSA via NIST had pushed for the backdoor-able by design EC-DRBG random generator (RNG), and required standard compliant applications to include the backdoor,  made a general distrust apparent in organizations like IETF or IRTF. Furthermore, more public scrutiny of NSA's contributions followed. You can see a nice example on the reactions to NSA's attempt to standardize extensions to the TLS protocol which will expose more state of the generator; an improvement which would have enabled the EC-DRBG backdoor to operate in a more efficient way under TLS.

Given the above, several proposals were made to no longer rely on NIST's recommendations for elliptic curve cryptography or otherwise. That is, both their elliptic curve parameters as well as their standardized random generators, etc. The first to propose alternative curves to IETF was the German BSI which proposed the brainpool curves. Despite their more open design, they didn't receive much attention by implementers. The most current proposal to replace the NIST curves comes from the Crypto Forum Research Group (CFRG) and proposes two curves, curve25519 and curve448. These, in addition to being not-proposed-by-NIST, can have a very fast implementation in modern systems and can be implemented to operate in constant time, something that is of significant value for digital signatures generated by servers. These curves are being considered for addition in the next iteration of the TLS elliptic curve document for key exchange, and there is also a proposal to use them for PKIX/X.509 certificate digital signatures under the EdDSA signature scheme.

For the non-NIST symmetric cipher replacements the story is a bit different. The NIST-standardized AES algorithm is still believed to be a very strong cipher, and will stay with us for quite long time especially since it is already part of the x86-64 CPU instruction set. However, for CPUs that do not have that instruction set, encryption performance is not particularly impressing. That, when combined with the common in TLS GCM authenticated-encryption construction which cannot be easily optimized without a specific (e.g., PCLMUL) instruction set being present, put certain systems on a disadvantage. Prior to RC4 being known to be completely broken, this was the cipher to switch your TLS connection to, for such systems. However, after RC4 became a cipher to display on a museum, a new choice was needed. I have written about the need for it back in 2013, and it seems today we are very close to having Chacha20 with Poly1305 as authenticator being standardized for use in the TLS protocol. That is an authenticated-encryption construction defined in RFC7539, a construction that can outperform both RC4 and AES on hardware without any cipher-specific acceleration.


Note that in non-NIST reliance trend, in GnuTLS we attempt to stay neutral and decide our next steps on case by case basis. Not everything coming or being standardized from NIST is bad, and algorithms that are not standardized by NIST do not become automatically more secure. Things like EC-DRBG for example were never part of GnuTLS not because we disliked NIST, but because this design didn't make sense for a random generator at all.

No more CBC

TLS from its first incarnation used a flawed CBC construction which led to several flaws over its years, the most prominent being the Lucky13 attack. The protocol in its subsequent updates (TLS 1.1 or 1.2) never fixed these flaws, and instead required the implementers to have a constant time TLS CBC pad decoding, a task which proved to be notoriously hard, as indicated by the latest OpenSSL issue. While there have been claims that some implementations are better than others, this is not the case here. The simple task of reading the CBC padding bytes, which would have been 2-3 lines of code normally, requires tens of lines of code (if not more), and even more extensive testing for correctness/time invariance.  The more code a protocol requires, the more mistakes (remember that better engineers make less mistakes, but they still make mistakes). The protocol is at fault, and the path taken in the next revision of TLS, being 1.3, is to completely drop the CBC mode. In RFC7366 there is a proposal to continue using the CBC ciphersuites in a correct way which would pose no future issues, but the TLS 1.3 revision team though it is better to move away completely from something that has caused so many issues historically.

In addition to that, it seems that all the CBC ciphersuites were banned from being used under HTTP/2.0 even when used under TLS 1.2. That would mean that applications talking HTTP/2.0 would have to disable such ciphersuites or they may fail to interoperate. This move effectively obsoletes the RFC7366 fix for CBC (already implemented in GnuTLS).


Cryptographically-speaking the CBC mode is a perfectly valid mode when used as prescribed. Unfortunately when TLS 1.0 was being designed, the prescription was not widely available or known, and as such it got into to protocol following the "wrong" prescription. As it is now and with all the bad PR around it, it seems we are going to say goodbye to it quite soon.

For the emotionally tied with CBC mode like me (after all it's a nice mode to implement), I'd like to note that it will still live with us under the CCM ciphersuites but on a different role. It now serves as a MAC algorithm. For the non-crypto geeks, CCM is an authenticated encryption mode using counter mode for encryption and CBC-MAC for authentication; it is widely used in the IoT, an acronym for the 'Internet of Things' buzzwords, which typically refers to small embedded devices.




Latency reduction


One of the TLS 1.3 design goals, according to its charter, is to "reduce handshake latency, ..., aiming for one roundtrip for a full handshake and one or zero roundtrip for repeated handshakes". That effort was initiated and tested widely by Google and the TLS false start protocol, which reduced the handshake latency to a single roundtrip, and further demonstrated its benefits with the QUIC protocol. The latter is an attempt to provide multiple connections over a stream into userspace, avoiding any kernel interaction for de-multiplexing or congestion avoidance into user-space. It comes with its own secure communications protocol, which integrates security into the transport layer, as opposed to running a generic secure communications protocol over a transport layer. That's a certainly interesting approach and it remains to be seen whether we will be hearing more of it.

While I was initially a skeptic for modifications to existing cryptographic protocols to achieve low latency, after all such modifications reduce the security guarantees (see this paper for a theoretical attack which can benefit from false-start), the requirement for secure communications with low latency is there to stay. Even though the strive to reduce latency for HTTP communication may not be convincing for everyone, one cannot but imagine a future where high latency scenarios like this are the norm, and low-roundtrip secure communications protocols are required.

Post-quantum cryptography

Since several years it is known that a potential quantum computer can break cryptographic algorithms like RSA or Diffie Hellman as well as the elliptic curve cryptosystems. It was unknown whether a quantum computer at the size where it could break existing cryptosystems could ever exist, however research conducted the last few years provides indications that this is a possibility. NIST hosted a conference on the topic last year, where NSA expressed their plan to prepare for a post-quantum computer future. That is, they no longer believe that elliptic curve cryptography, i.e., their SuiteB profile, is a solution which will be applicable long-term. That is, because due to their use of short keys, the elliptic curve algorithms require a smaller quantum computer to be broken, rather than their finite field counterparts (RSA and Diffie-Hellman). Ironically, it is easier to protect the classical finite field algorithms from quantum computers by increasing the key size (e.g., to 8k or 16k keys) than their more modern counterparts based on elliptic curves.

Other than the approach of increasing the key sizes, today we don't have much tools (i.e., algorithms) to protect key exchanges or digital signatures against a quantum computer future. By the time a quantum computer of 256-qubits or larger roughly 384 qubits is available all today's communication which is based on elliptic curves will be potentially be made available to the owner of such a system. Note, that this future is not expected soon; however, no-one would dare to make a prediction for that. Note also, that the existing systems of D-WAVE are not known to be capable of breaking the current cryptosystems.

Neither IETF or any other standardizing body has any out of the box solution. The most recent development is a NIST competition for quantum computer resistant algorithms, which is certainly a good starting point. It is also a challenge for NIST as it will have to overcome the bad publicity due to the EC-DRBG issue and reclaim its position in technology standardization and driver. Whether they will be successful on that goal, or whether we are going to have new quantum-computer resistant algorithms at all, it remains to be seen.



Finally: the GnuTLS 3.5.0 new features

In case you managed to read all of the above, only few paragraphs are left. Let me summarize the list of prominent changes.

  • SHA3 as a certificate signature algorithm. The SHA3 algorithm on all its variations (256-512) was standardized by FIPS 202 publication in August 2015. However, until very recently there were no code points (or more specifically object identifiers) for using it on certificates. Now that they are finally available, we have modified GnuTLS to include support for generating, and verifying certificates with SHA3. For SHA3 support in the TLS protocol either as a signature algorithm or a MAC algorithm we will have to wait further for code points and ciphersuites being available and standardized. Note also, that since GnuTLS 3.5.0 is the first implementation supporting SHA3 on PKIX certificates there have not been any interoperability tests with the generated certificates.
  • X25519 (formerly curve25519) for ephemeral EC diffie-hellman key exchange. One quite long-time expected feature we wanted to introduce in GnuTLS is support for alternative to the standardized by NIST elliptic curves. We selected curve25519 originally described in draft-ietf-tls-curve25519-01 and currently in the document which revises the elliptic curve support in TLS draft-ietf-tls-rfc4492bis-07. The latter document --which most likely means the curve will be widely implemented in TLS-- and the advantages of X25519 in terms of performance are the main reasons of selecting it. Note however, that X25519 is a peculiar curve for implementations designed around the NIST curves. That curve cannot be used with ECDSA signatures, although it can be used with a similar algorithm called EdDSA. We don't include EdDSA support for certificates or for the TLS protocol in GnuTLS 3.5.0 as the specification for it has not settled down. We plan to include it in a later 3.5.x release. For curve448 we would have to wait until its specification for digital signatures is settled and is available in the nettle crypto library.
  • TLS false start. The TLS 1.2 protocol as well as its earlier versions required a full round-trip time of 2 for its handshake process. Several applications require reduced latency on the first packet and so the False start modification of TLS was defined. The modification allows the client to start transmitting at the time the encryption keys are known to him, but prior to verifying the keys with the server. That reduces the protocol to a single round-trip at the cost of putting the initially transmitted messages of the client at risk. The risk is that any modification of the handshake process by an active attacker will not be detected by the client, something that can lead the client to negotiate weaker security parameters than expected, and so lead to a possible decryption of the initial messages. To prevent that GnuTLS 3.5.0 will not enable false start even if requested when it detects a weak ciphersuite or weak Diffie-Hellman parameters.  The false start functionality can be requested by applications using a flag to gnutls_init().
  • New APIs to access the Shawe-Taylor-based provable RSA and DSA parameter generation. While enhancing GnuTLS 3.3.x for Red Hat in order to pass the FIPS140-2 certification, we introduced provable RSA and DSA key generation based on the Shawe-Taylor algorithm, following the FIPS 186-4 recommendations. That algorithm allows generating parameters for the RSA and DSA algorithms from a seed that are provably prime (i.e., no probabilistic primality tests are included). In practice this allows an auditor to verify that the keys and any parameters (e.g., DH) present on a system are generated using a predefined and repeatable process. This code was enabled only when GnuTLS was compiled to enable FIPS140-2 mode, and when the system was put in FIPS140-2 compliance mode. In GnuTLS 3.5.0 this functionality is made available unconditionally from the certtool utility, and a key or DH parameters will be generated using these algorithms when the --provable parameter is specified. That required to modify the storage format for RSA and DSA keys to include the seed, and thus for compatibility purposes this tool will output both old and new formats to allow the use of these parameters from earlier GnuTLS versions and other software.
  • Prevent the change of identity on rehandshakes by default.  The TLS rehandshake protocol is typically used for three reasons, (a) rekey on long standing connections, (b) re-authentication and (c) connection upgrade. The rekey use-case is self-explanatory so my focus will be on the latter two. Connection upgrade is when connecting with no client authentication and rehandshaking to provide a client certificate, while re-authentication is when connecting with an identity A, and switching to identity B mid-connection. With that change in GnuTLS the latter use case (re-authentication) is prohibited unless the application has explicitly requested it. The reason is that the majority of applications using GnuTLS are not prepared to handle a connection identity change in the middle of a connection something that depending on the application protocol may lead to issues. Imagine the case where a client authenticates to access a resource, but just before accessing it, the client switches to another identity by presenting another certificate. It is unknown whether applications using GnuTLS are prepared for such changes, and thus we considered important to protect applications by default by requiring applications that utilize re-authentication to explicitly specify it via a flag to gnutls_init(). This change does not affect applications using rehandshake for rekey or connection upgrade.

That concludes my list of the most notable changes, even though this release includes several other ones, including a stricter protocol adherence in corner cases and a significant enhancement of the included test suite. Even though I am the primary contributor, this is a release containing code contributed by more than 20 people which I'd like to personally thank for their contributions.

If you are interested in following our development or helping out, I invite you on our mailing list as well as to our gitlab pages.


Monday, February 8, 2016

Why do we need SSL VPNs today?

One question that has been bothering me for quite a while, is why do we need SSL VPNs? There is an IETF standardized VPN type, IPSec, and given that, why do SSL VPNs still get deployed? Why not just switch everything to IPSec? Moreover, another important question is, since we have IPSec since around 1998, why IPSec hasn't took over the whole market of VPNs? Note that, I'll be using the term SSL even though today it has been replaced by Transport Layer Security (TLS) because the former is widely used to describe this type of VPNs.

These are valid questions, but depending on who you ask you are very likely to get a different answer. I'll try to answer from an SSL VPN developer standpoint.


In the VPN world there are two main types of deployment, the 'site-to-site' and the 'remote access' types. To put it simply, the first is about securing lines between two offices, and the latter is about securing the connection between your remote users and the office. The former type may rely on some minimal PKI deployment or pre-shared keys, but the latter requires integration with some user database, credentials, as well as settings which may be applied individually for each user. In addition the 'remote access' type is often associated with accounting such as keeping track how long a user is connected, how much data has been transferred and so on. That may remind you the kind of accounting used in ppp and dial-up connections, and indeed the same radius-based accounting methods are being used for that purpose.

Both of the 'site-to-site' and 'remote access' setups can be handled by either SSL or IPSec VPNs. However, there are some facts that make some VPNs more suitable for one purpose than the other. In particular, it is believed that SSL VPNs are more suitable for the 'remote access' type of VPNs, while IPSec is unquestionably the solution one would deploy on site-to-site connections. In the next paragraphs I focus on the SSL VPNs and try to list their competitive advantage for the purpose of 'remote access'.
  1. Application level. In SSL VPNs the software is at the application level, and that  means that it can provided by the software distributor, or even by the administrator of the server. These VPN applications can be customized for the particular service the user connects to (e.g., include logos, or adjust to the environment the user is used to, or even integrate VPN connectivity with an application). For example the 12vpn.net VPN provider customizes the openconnect-gui application (which is free software) to provide it with a pre-loaded list of the servers they offer to their customers. Several other proprietary solutions use a similar practice, and the server provides the software for the end users.
  2. Custom interfaces for authentication. The fact that (most) SSL VPNs run over HTTPS, it provides them with an inherent feature of having complete control over the authentication interface they can display to users. For example in Openconnect VPN we provide the client with XML forms that the user is presented and must fill in, in order to authenticate. That usually covers typical password authentication, one time passwords, group selections, and so on. Other SSL VPN solutions use entirely free form HTML authentication and often only require a browser to log to the network. Others integrate certificate issuing on the first user connection using SCEP, and so on.
  3. Enforcing a security policy. Another reason (which I don't quite like or endorse - but happens quite often) is that the VPN client applications, enforce a particular company-wide security policy; e.g., ensure that anti-virus software is running and up to date, prior to connecting to the company LAN. This often is implemented with server provided executables being run by the clients, but that is also a double-edged sword as a VPN server compromise will allow for a compromise of all the clients. In fact the bypass of this "feature" was one of the driving reasons behind the openconnect client.
  4. Server side user restrictions. On the server-side the available freedom is comparable with the client side. Because SSL VPNs are on the application layer protocol, they are more flexible in what the connecting client can be restricted to. For example, in openconnect VPN server invidual users, or groups of them can be set into a specific kernel cgroup, i.e., limiting their available CPU time, or can be restricted to a fixed bandwidth in a much more easy way than in any IPSec server.
  5. Reliability, i.e., operation over any network. In my opinion, the major reason of existance of SSL VPN applications and servers is that they can operate under any environment. You can be restricted by firewalls, broken networks which block ESP or UDP packets and still be able to connect to your network. That is, because the HTTPS protocol which they rely on, cannot be blocked without having a major part of the Internet go down. That's not something to overlook; a VPN service which works most of the times but not always because the user is within some misconfigured network is unreliable. Reliability is something you need when you want to communicate with colleagues when being on the field, and that's the real problem SSL VPN solve (and the main reason companies and IT administrators usually pay extra to have these features enabled). Furthermore, solutions like Openconnect VPN utilize a combination of HTTPS (TCP) and UDP when available to provide the best possible user experience. It utilizes Datagram TLS over UDP when it detects that this is allowed by network policy (and thus avoiding the TCP over TCP tunneling issues), and falls back to tunneling over HTTPS when the establishment of the DTLS channel is not possible.

That doesn't of course mean that IPSec VPNs are obsolete or not needed for remote access. We are far from that. IPSec VPNs are very well suited for site-to-site links --which are typically on networks under the full control of the deployer-- and are cross platform (if we ignore the IKEv1 vs IKEv2 issues), in the sense that you are very likely to find native servers and clients offered by the operating system. In addition, they possess a significant advantage; because they are integrated with the operating system's IP stack, they utilize the kernel for encryption which removes the need for userspace to kernel space switches. That allows them to serve high bandwidths and spend less CPU time. A kernel side TLS stack, would of course provide SSL VPNs a similar advantage but currently that is work in progress.

As a bottom line, you should chose the best tool for the job at hand based on your requirements and network limitations. I made the case for SSL VPNs, and provided the reasons of why I believe they are still widely deployed and why they'll continue to. If I have already convinced you for the need for SSL VPNs, and you are an administrator working with VPN deployments I'd like to refer you to my FOSDEM 2016 talk about the OpenConnect (SSL) VPN server, on which I describe the reasons I believe it provides a significant advantage over any existing solutions in Linux systems.