Monday, April 3, 2017

The mess with internationalized domain names

While internationalized domain names (DNS names) are not common in the English speaking world, they exist and their use was standardized by IETF's IDNA standards. I first found out the existence of that possibility while reading the IETF's best practices for domain name verification. As english is not my mother tongue I was particularly interested on the topic, and wanted to make sure that GnuTLS would handle such domains correctly both for storing such domains, and verifying them. That proved not to be an easy task. The following text summarizes my brief understanding of the issues in the field (disclaimer: I am far from an expert in software internationalization topics).

How does IDNA work?

To make a long story short, the IDNA protocols are based on a simple principle. They translate domain names typed with unicode characters (UTF-8 or otherwise), to a US-ASCII (English text) representation which becomes the actual domain name. For example the greek name "ένα.gr" would translate to "xn--ixai9a.gr". On Linux systems one can find Simon Josefsson's idn and idn2 tools (more on that below), which can be used to translate from an internationalized string to IDNA format. For example:

    $ echo "ενα.gr"|idn
    xn--mxahy.gr

 

What are the issues with IDNA?

Although there are simple to use libraries (see Libidn) to access IDNA functionality, there is a catch. In 2010, IETF updated the IDNA standards with a new set of standards called IDNA2008, which were "mostly compatible" with the original standard (called IDNA2003). Mostly compatible meant that the majority of strings mapped to the same US-ASCII equivalent, though some didn't. They mapped to a different string. That affected many languages, including the Greek language mappins, and the following table displays the IDNA2003 and IDNA2008 mappings of few "problematic" Greek domain names.

non-English string IDNA2003 IDNA2008
νίκος.gr xn--kxawhku.gr xn--kxawhkp.gr
νίκοσ.gr xn--kxawhku.gr xn--kxawhku.gr
NΊΚΟΣ.gr xn--kxawhku.gr (undefined)

In the above table, we can see the differences in mappings for three strings. All of the above strings can be considered to be equal in the greek language, as the third is the capitalized version of the first, and the second is the "dumb" lower case equivalent of the last.

The problematic character is 'σ' which in Modern Greek is switches to 'ς' when it is present at the end of word. As both characters are considered to be identical in the language, they are both capitalized to the same character 'Σ' (Sigma).

There are two changes in IDNA2008 standard which affect the examples above. The first, is the treatment of the 'ς' and 'σ' characters as different, causing the discrepancy between the mappings in the first and second examples. The second is that IDNA2008 is defined only for a specific set of characters, and there is no pre-processing phase, which causes the undefined state of the third string, that contains capital letters. These changes, create a discrepancy between expectations formed by observing the behavior of domains consisting of US-ASCII strings and the actual reality with Internationalized scripts. Similar cases exist in other languages (e.g., with the treatment of the 'ß' character in German).

Even though some work-arounds of the protocol may seem obvious or intuitive to implement, such as lower-casing characters prior to converting to IDNA format, lower-casing doesn't make sense in all languages. This is the reason that the capitalized version (NΊΚΟΣ.gr) of the first string on the table, is undefined in IDNA2008.

You can verify the mappings I presented above with the idn2 application, which is IDNA2008-compliant. For example:

    echo "NΊΚΟΣ.gr"|idn2
    idn2: lookup: string contains a disallowed character

 

Is there any solution?

To address these issues, a different standards body --the Unicode consortium-- addressed the issue with the Unicode Technical Standard #46 (UTS#46 or TR#46). That standard was published in 2016 to clarify few aspects of IDNA2008 and propose a compatible with IDNA2003 behavior.

UTS#46 proposes two modes of IDNA2008, the transitional, which results to problematic characters being mapped to their IDNA2003 equivalents and non-transitional mode, which is identical to the original IDNA2008 standard. In addition it requires the internationalized input to be pre-processed with the CaseFold algorithm which allows handling domain names such as "ΝΊΚΟΣ.gr" under IDNA2008.

 

Switching to IDNA2008

Unfortunately even with UTS#46, we are left with two IDNA2008 variants. The transitional which is IDNA2003 compatible and the non-transitional which is IDNA2008 incompatible. Some NICs and registrars have already switched to IDNA2008 non-transitional, but not all software has followed up.

A problem is that UTS#46 does not define a period for the use of transitional encodings, something that makes their intended use questionable. Nevertheless, as the end-goal is to switch to the non-transitional IDNA2008, it still makes it practical to switch to it by clarifying several undefined parts of the original protocol (e.g., adds pre-processing phase). As a result, few browsers (e.g., Firefox) have already switched to it. It is also possible for software based on libidn, which only supports IDNA2003, to switch. The libidn2 2.0.0 release includes libidn compatible APIs making it possible to switch to IDNA2008 (transitional or not).

 

Should I do the switch?

There are few important aspects of the IDNA2008 (non-transitional) domain names, which have to be taken into account prior to switching. As we saw above, the semantics of entering a domain in upper case, and expecting it to be translated to the proper web-site address wouldn't work for internationalized domain names. If one enters the domain "ΝΊΚΟΣ.gr", it would translate to the domain xn--kxawhku.gr (i.e., "νίκοσ.gr"), which is a misspelled version of the correct in Greek language "νίκος.gr".

Moreover, as few software has switched to IDNA2008 non-transitional processing of domain names, there is always the discrepancy between the IDNA2003 mapping and the IDNA2008 mapping. That is, a domain owner would have to be prepared to register both the IDNA2003 version of the name and the IDNA2008 version of it, to ensure all users are properly redirected to his intended site. This is apparent on the following real domains.
  • http://fass.de
  • http://faß.de
If you are a German speaker you most likely consider them equivalent, as the 'ß' character is often expanded to 'ss'. That is how IDNA2003 treated that character, however, that's not how IDNA2008 treats it. If you now use the Chrome browser which uses IDNA2003 (or more precisely IDNA2008 transitional), both of these URIs you will be re-directed to the same web-site, fass.de. However, if you use Firefox, which uses IDNA2008, you will be re-directed to two different web sites. The first being the fass.de and the second the xn--fa-hia.de.

That discrepancy was treated as a security issue by the curl and wget projects and was assigned CVE-2016-8625. Both projects switched to non-transitional IDNA2008.

 

What about certificates, can they address the issue above?

Unfortunately the above situation, cannot be fixed with X.509 certificates and in fact such a situation undermines the trust in them. The operation of X.509 certificates for web site authentication, is based on the uniqueness of domain names. In english language we can be sure that a domain name, whether entered in upper or lower case will be mapped to unique web-site. With internationalized names that's no longer the case.

What is unique in internationalized names is the final output domain, e.g., xn--kxawhku.gr, which for authentication purposes is meaningless as it is, so we have to rely on software to do the reverse mapping for us, on the right place. If the software we use uses different mapping rules than the rules applied by the registrar of the domain, users are left helpless as in the case above.

 

What to do now?

Although at this point, we know that IDNA2008 has quite some peculiarities which will be problematic at the future, we have no better option available. IDNA2003 cannot support new unicode standards and is already obsolete, so biting the bullet, and moving to non-transitional IDNA2008 seems like the right way to go. It is better to have a single and a little problematic standard, rather than have two active standards for domain name mapping.

Tuesday, March 21, 2017

Improving by simplifying the GnuTLS PRNG

One of the most unwanted baggages for crypto implementations written prior to this decade is the (pseudo-)random generator, or simply PRNG. Speaking for GnuTLS, the random generator was written at a time where devices like /dev/urandom did not come by default on widely used operating systems, and even if they did, they were not universally available, e.g., devices would not be present, the Entropy Gathering Daemon (EGD) was something that was actually used in practice, and was common for software libraries like libgcrypt to include code to gather entropy on a system by running arbitrary command line tools.

That resulted in an internal random generator which had to rely on whatever was provided by the operating system and the administrator, and that, in several cases was insufficient to seed a cryptographic PRNG. As such, an advanced PRNG was selected, based on Yarrow, which kept a global per-process state, and was aggressively gathering information, including high precision timestamps and process/thread statistics, to enhance a potentially untrusted pool formed from the system random generator or EGD. That, also meant locks for multi-threaded processes to access the global state, and thus a performance bottleneck, since a call to the PRNG is required even for the simplest of crypto operations.

Today, however, things have changed in operating systems. While Linux used to be a pioneer with /dev/urandom, now all operating systems provide a reliable PRNG, even though there are still no standardized APIs.
  • Linux provides /dev/urandom, getrandom(), getentropy()
  • Windows provides CryptGenRandom()
  • *BSD provides /dev/urandom, getentropy()
  • MacOSX provides /dev/urandom, getentropy()
  • Solaris: /dev/urandom, getentropy(), getrandom().
On the list above, I ignore the /dev/random interface which has concerning properties, such as indefinite response time (see my previous post for limitations on the Linux interfaces).

Some of the interfaces above are provided as system calls, some others as libc calls, and others as file system devices, but for the application writer, that shouldn't make significant difference. These devices or system calls, provide access to a system PRNG, which is in short doing what was GnuTLS doing manually previously, mixing various inputs from the system, in a level and way that a userspace library like GnuTLS could never do, as the kernel has direct access to available hardware and interrupts.

Given the above, a question that I've been asking myself lately, is whether there is any reason to continue shipping something advanced such as a Yarrow-based PRNG in GnuTLS? Why not switch to simple PRNG, seeded only by the system device? That would not only provide simplicity in the implementation, but also reduce the performance and memory cost of complex constructions like Yarrow. In turn, switching to something simple with low memory requirements would allow having a separate PRNG per-thread, further eliminating the bottleneck of a global per-process PRNG.

The current PRNG


To provide some context on GnuTLS' PRNG, it is made available through the following function all:
 int gnutls_rnd(gnutls_rnd_level_t level, void *data, size_t len);
That takes as input an indicative level, which can be NONCE for generating nonces, RANDOM for session keys, or KEY for long term keys. The function outputs random data in the provided buffer.

There was (a partial) attempt in GnuTLS 3.3.0 to improve performance, by introducing a Salsa20-based PRNG for generating nonces, while keeping Yarrow for generating keys. This change, although it provided the expected performance improvement for the generation of nonces, it still kept global state, and thus still imposed a bottleneck for multi-threaded processes. At the same time, it offered no improvement on the memory consumption (in fact it was increased slightly by a Salsa20 instance - around 64 bytes).

For the yet-unreleased 3.6.0, we took that enhancement several steps further, ensuing the elimination of the locking bottleneck for multi-threaded processes. It was a result of a relatively large patch set, improving the state of the internal PRNG, and rewriting it, to the following layout.

The new PRNG


The Yarrow and Salsa20 PRNGs were replaced by two independent PRNGs based on the CHACHA stream cipher. One PRNG is intended to be used for the NONCE level (which we'll refer to it as the nonce PRNG) and the other for KEY and RANDOM levels (the key PRNG). That reduces the memory requirements by eliminating the heavyweight Yarrow, and at the same time allows better use of the CPU caches, by employing a cipher that is potentially utilized by the TLS protocol, due to the CHACHA-POLY1305 ciphersuite.

To make the state lock-free, these two generators keep their state per thread by taking advantage of thread local data. That imposes a small memory penalty per-thread --two instances of CHACHA occupy roughly 128-bytes--, albeit, it eliminates the bottleneck of locks to access the random generator in a process.

Seeding the PRNG

The PRNGs used by GnuTLS are created and seeded on the first call to gnutls_rnd(). This behavior is a side-effect of a fix for getrandom() blocking in early boot in Linux, but it fits well with the new PRNG design. Only threads which utilize the PRNG calls will allocate memory for it, and carry out any seeding.

For threads that utilize the generator, the initial seeding involves calling the system PRNG, i.e., getrandom() in Linux kernel, to initialize the CHACHA instances. The PRNG is later re-seeded; the time of the re-seed depends both on time elapsed and the amount of bytes generated. At the moment of writing, the nonce PRNG will be re-seeded when 16MB of is generated, or 4 hours of operation, whichever is first. The key PRNG will re-seed using the operating system's PRNG, after 2MB of data are generated, or after 2 hours of operation.

As a side note, that re-seed based on time was initially a major concern of mine, as it was crucial for a call to random generator to be efficient, without utilizing system calls, i.e., imposing a switch to kernel mode. However, in glibc calls like time() and gettimeofday() are implemented with vdso something that transforms a system call like time(), to a memory access, hence do not introduce any significant performance penalty.

The data limits imposed to PRNG outputs are not entirely arbitrary. They allow several thousands of TLS sessions, prior to re-seeding, to avoid re-introducing a bottleneck on busy servers, this time being the system calls to operating system's PRNG.

Defense against common PRNG attacks

There are multiple attacks against a PRNG, which typically require a powerful adversary with access to the process state (i.e., memory). There are also attacks on which the adversary controls part of the input/seed to PRNG, but we axiomatically assume a trusted Operating System, trusted not only in the sense of not being backdoored, but also in the sense of doing its PRNG job well.

I'll not go through all the details of attacks (see here for a more detailed description), but the most prominent of these attacks and applicable to our PRNG are state-compromise attacks. That is, the attacker obtains somehow the state of the PRNG --think of a heartbleed-type of attack which results to the PRNG state being exposed--, and uses that exposed state to figure out past, and predict future outputs.
Given the amount of damage a heartbleed-type of attack can do, protecting against the PRNG state compromise attacks remind this pertinent XKCD strip. Nevertheless, there is merit to protecting against these attacks, as it is no longer unimaginable to have scenarios where the memory of the PRNG is exposed.

 

Preventing backtracking

This attack assumes that the attacker obtained access to the PRNG state at a given time, and would need to recover a number of bytes generated in the past. In this construct, both the  nonce and key PRNGs re-seed based on time, and data, after which recovery is not possible. As such an attacker is constrained to access data within the time or data window of the applicable generator.

Furthermore, generation of long-term keys (that is, the generator under the KEY level), ensures that such backtracking is impossible. That is, in addition to any re-seed previously described, the key generator will re-key itself with a fresh key generated from its own stream after each operation.

Preventing permanent compromise

That, is in a way the opposite of the previous attack. The attacker, still obtains access to the PRNG state at a given time, and would like to recover to recover all data generated in the future. In a design like this, we would like to limit the number of future bytes that can be recovered.

Again, the time and data windows of the PRNGs restrict the adversary's access within them. An attacker will have to obtain constant or periodic access to the PRNG state, to be able to efficiently attack the system.


Final remarks

The design of the new GnuTLS PRNG is quite similar to the arc4random implementation on the OpenBSD system. The latter despite its name, is also based on the CHACHA cipher. Few details differ, however. The GnuTLS PRNG enforces a refresh of the PRNG based on elapsed time, in addition to output data, does re-key only for when a requests for data at the KEY level, and strives for low memory footprint as it utilizes a separate generator per process thread.

 

Another thing to note, is that the fact that the gnutls_rnd() call allows for an advisory level to be specified, provides the internal implementation quite some flexibility. That is, the given level, although advisory, allows for optimizations to be enabled for levels that are not intended for secrecy. That is, apply different data and time limits on nonce and key generator, and thus increasing performance when possible. The cost of such a compromise for performance, is a larger window of exposure when the PRNG's state is compromised.


The generator described, will be made available in the next major release of GnuTLS, although the details may change.