nmav's Blog

Friday, May 4, 2018

GnuTLS and TLS 1.3

[last update 2019-05-25]

GnuTLS already contains support for the ~~latest TLS 1.3 draft (draft-ietf-tls-tls13-26) on its master git branch~~ TLS 1.3 RFC in its 3.6.5 release. One of our major challenges related to TLS 1.3 protocol, was making its support transparent for existing applications; that is, without any code changes or the need for re-compilation. We believe we have sufficiently addressed that challenge, while at the same time made available several new features introduced by the protocol. This post will go through the new features of TLS 1.3, and discuss how one can take advantage of them with GnuTLS. We will also discuss potential pain points during the migration. More information will be provided in the 'upgrade section' of the manual after the release.

New features under TLS 1.3

Unlike its predecessors which were consisting of incremental updates, TLS 1.3 is a clean slate re-write of the TLS 1.2 protocol. In brief, on the algorithmic level the changes are quite minimal, with the support for RSA-PSS signatures being the most significant change, and the elimination of all ciphers which had known issues. On the protocol level, the changes span many aspects, mainly changes to reduce the message roundtrips during handshake, several security related fixes and protocol simplifications. These changes include the complete elimination of the convoluted re-handshake process, and replace it by the simpler re-key and re-authentication processes. They also include the defense against passive eavesdroppers for the certificates sent during handshake, the introduction of a foundation for message length hiding, and the ability to attach OCSP staples to all certificates sent during handshake.

Testing TLS 1.3

In GnuTLS 3.6.5, TLS 1.3 is enabled by default.

Single roundtrip handshake

In contrast with TLS 1.2 which had a handshake message round-trip of 2, the TLS 1.3 handshake process typically takes a single round-trip. The idea behind that optimization is that the client sends its key shares for the key exchange speculatively, by guessing the Diffie-Hellman group(s) the server supports. If that speculation fails, the handshake falls backs to a more verbose mode which includes two additional messages (Hello-Retry-Request and ClientHello) to negotiate the commonly supported groups.

In GnuTLS the client sends by default the key share of the two distinct group types which have the highest priority; currently that is the elliptic curve groups SECP256R1 and X25519. The reason key shares from these two groups were selected is because they are widely available, and require little resources from a CPU-time perspective, while minimizing the risk of fall-back to the more verbose handshake. In order to modify that behavior, applications can re-order the preferred group list via priority strings, and/or provide a special flag to gnutls_init() which switches the behavior. The flag GNUTLS_KEY_SHARE_TOP instructs GnuTLS to send a single key share, GNUTLS_KEY_SHARE_TOP3 to send the top 3 distinct group key shares (e.g., SECP256R1, X25519 and FFDHE2048).

Note however, that the round-trips achieved by TLS 1.3 do not translate to network roundtrip when operating under TCP. That is because TCP introduces an additional round-trip due to the SYN-ACK TCP exchange. To eliminate the additional round-trip, the same tricks apply as with TLS 1.2; one needs to rely on the TCP fast open.

New ciphersuites and key exchange changes

TLS 1.3 replaces all the ciphersuites available under TLS 1.2 and earlier with a new set of five ciphersuites. The ciphersuite meaning is also changed from expressing the key exchange, authentication method and cipher/MAC, to define the cipher and MAC algorithms only. The reason for that change is that in TLS 1.3 there are only two authentication methods defined; the certificate and the pre-shared key (PSK) which are negotiated separately using extensions. Similarly, the key exchange methods, expanded in scope since TLS1.2, and from the being part of the ciphersuite, as elliptic curve Diffie-Hellman (ECDHE) or finite field Diffie-Hellman (DHE), they are now fine-tuned to the actual group in use. That is SECP256R1 for the NIST P256R1 elliptic curve group, or X25519, or FFDHE-2048 for the RFC7919 Diffie-Hellman group and so on. The groups are negotiated via the supported groups TLS extension, and are set in GnuTLS via the priority strings.

The following new ciphersuites are introduced in TLS 1.3:

AES-256-GCM
CHACHA20-POLY1305
AES-128-GCM
AES-128-CCM
AES-128-CCM-8

They are all supported by GnuTLS, though AES-128-CCM-8, which has a truncated tag, is not enabled by default. Furthermore, the fact that TLS 1.3 limited all the available ciphers to this small set, prompted us for a cleanup of our default settings for TLS 1.2 and earlier. We saw no reason to provide different cipher choices depending on the negotiated protocol. As a result we kept that base of ciphers, disabling the Camellia and 3DES ciphersuites across all protocol versions.

Note that existing applications need no changes to take advantage of the new ciphers. The existing priority strings were sufficiently flexible to support them.

Post handshake authentication

TLS 1.2 provided a combined re-authentication and re-key mechanism under the re-handshake umbrella. In TLS 1.3 these two mechanisms are disassociated, and replaced by the re-key (to be discussed later) and the post-handshake authentication mechanism. The latter, is a mechanism that can be triggered only by the server, and requests the client to present a certificate. In GnuTLS the implementation relies on a new non-fatal error code which must be handled by the client application. To avoid unintentional side-effects to existing software, this mechanism must be explicitly enabled by the client and the server. That enablement is done by specifying GNUTLS_POST_HANDSHAKE_AUTH as a gnutls_init() flag in both client and server. Once that done, a server can request post-handshake authentication from the client by calling the new API, gnutls_reauth(), and the client should re-act to the GNUTLS_E_REAUTH_REQUEST error code, by calling the same function.

Re-keying

Under TLS 1.3 applications can re-key with a very simple mechanism which involves the transmission of a single message. Furthermore, GnuTLS handles re-key transparently and every GnuTLS client and server will automatically re-key after 2^24 messages are exchanged, unless the GNUTLS_NO_AUTO_REKEY flag is specified in gnutls_init(), or the cipher's security properties requires no re-keying as in the CHACHA20-POLY1305 cipher.

To accomodate for well-written applications which were using the re-handshake mechanism of TLS 1.2 to perform re-key, the gnutls_rehandshake() and gnutls_handshake() calls under TLS 1.3 perform a re-key on server and client side respectively.

Server applications which were relying on the re-handshake process to re-authenticate the client must be modified to take advantage of post-handshake authentication under TLS 1.3.

Message length hiding

In TLS 1.3 it is possible to hide the length of the transmitted data on the wire by introducing padding bytes. To take advantage of that, we enhanced the gnutls_record_send_range() and related APIs, and after realizing their complexity when performing certain length hiding tasks, we decided to introduce a new and simple API for adding padding data. That is, the gnutls_record_send2() API. The new API allows sending an arbitrary amount of data, together with an arbitrary amount of padding limited only by the record constraints.

Attaching multiple OCSP staples

Under TLS 1.3, servers and clients can attach OCSP responses for more than a single certificate in their certificate chain. That is, they can provide fresh responses for their certificate authority, or other intermediate authorities, making the task of certificate validation easier for the peer, without involving the OCSP server during the handshake.

The addition of that functionality resulted to enhancements in the ocsptool utility, in order to be able to export and import OCSP responses in PEM format, something that allows them to be combined with the already PEM-encoded certificate chains. Applications could then use the APIs gnutls_certificate_set_ocsp_status_request_mem() or gnutls_certificate_set_ocsp_status_request_file2() to specify OCSP responses which correspond to their certificates.

RSA-PSS certificates and TLS 1.3 deployment

When we introduced support for RSA-PSS keys and certificates in GnuTLS 3.6.0, we saw them at the time as necessary for TLS 1.3 negotiation, and as a potential to increase the overall protocol security. However, that never happened through the TLS 1.3 drafts, possibly due to striking a balance to the need for security against cross-protocol attacks and compatibility with existing keys and practices. Furthermore, RSA-PSS keys and certificates are quite new, not being universally supported by CAs or by hardware security modules and smart cards.

As such, although RSA-PSS certificates are allowed by the protocol and supported by GnuTLS, the expected method to obtain an RSA-PSS signature in TLS 1.3, is by utilizing a "legacy" RSA PKCS#1 1.5 certificate associated with a multi-purpose RSA key. That is, a legacy RSA key is expected to be used for RSA-PSS, PKCS#1 1.5 signatures and PKCS#1 1.5 decryption. That, to a cryptographer, voids any security proofs available to RSA-PSS algorithm, since such proofs assume a key which is used for a single purpose.

That, although it may have failed to provide a bump in the overall Internet security, had quite good usability reasons to back it up. We are far from having a universal support of RSA-PSS certificates in existing software or hardware, and the ability to re-use the exact same setup and keys as a TLS 1.2 server was seen as of paramount importance in the TLS working group. The protocol designers recognized, however, the security issues and have documented the best practices sufficient to defend against the worst offender in RSA key attacks. That is, against the RSA PKCS#1 (decryption) ciphersuites, which have been the root of every new clone of Bleichenbacher's attack. Their recommendation is to disable the static RSA ciphersuites on TLS 1.3 servers. That is not done (at least for now) by GnuTLS by default, though, it can be achieved with a priority string which disables static RSA, e.g., "NORMAL:-RSA". The reason of not disabling these ciphersuites by default is to allow connections by legacy software and embedded software which often rely on the static RSA due to its simplicity.

RSA client certificates on smart cards and TLS 1.3 deployment

Under TLS 1.3 it is no longer possible to negotiate the older RSA signing standard, and thus existing smart cards containing RSA keys but not supporting the RSA-PSS mechanism, cannot be used under TLS1.3. There is no easy solution to that issue. Possible options could be (a) re-deploying all the smart cards with new which support RSA-PSS, or cars with the ECDSA algorithm, (b) disabling TLS 1.3 support on client or server side. At GnuTLS 3.6.8 we make sure that GnuTLS servers or clients when presented with such limited hardware, disable support for TLS1.3 transparently for applications instead of failing.

Key derivation

Although key derivation under TLS 1.3 uses different algorithms under the hood, the standard interfaces from RFC5705, continue to be functional for key derivation. As such the interface gnutls_prf_rfc5705() must be used for key derivation by applications; older interfaces like gnutls_prf_raw() or gnutls_prf() return an error code when used under TLS 1.3.

Session Resumption

Session resumption in the previous versions of TLS consisted of exporting the serialized session data as negotiated by TLS the handshake to use them in a future handshake. TLS 1.3 disassociates session resumption from session parameters, and session resumption is an optional server feature, advertised by the server in the form of a resumption ticket which can be sent at any time after the handshake is complete, and can be used only once to connect to the server. Merging the previous resumption semantics with the new, was quite a challenge, though it was possible. A difference with TLS 1.2 semantics is that a call to gnutls_session_get_data2() could wait to receive data from the network, for a maximum of 50 milliseconds.

What about applications not utilizing certificates?

This short section provides information for existing applications which rely on the SRP, Anonymous or PSK key exchanges which were available under TLS 1.2.

SRP / RSA-PSK key exchange

SRP and RSA-PSK key exchanges are not supported in TLS 1.3, so when these key exchanges are present in a priority string, TLS 1.3 is disabled.

Anonymous key exchange

There is no anonymous key exchange supported under TLS 1.3, so if an anonymous key exchange method is set in a priority string, and no certificate credentials are set in the client or server TLS 1.3 will not be negotiated. That approach allows for TLS 1.3 to be negotiated when a server or client supports both anonymous and certificate key exchange, i.e., when supporting an opportunistic-encryption negotiation scenario.

PSK key exchange

The pre-shared key exchange is supported in TLS 1.3, thus any existing setup with pre-shared key is unaffected by the upgrade to TLS 1.3. Both the Diffie-Hellman and "pure" PSK are supported under the new protocol, however in the priority strings the 'ECDHE-PSK' and 'DHE-PSK' options are handled as synonyms and they indicate the intent to support an ephemeral key exchange with the pre-shared key; the parameters of the key exchange, e.g., elliptic curves vs finite field, are negotiated with the supported groups specified in the priority string.

A thing to note is that although certificates are protected against passive eavesdroppers in TLS 1.3, PSK usernames are still sent in the clear, as in TLS 1.2.

Authentication-only ciphersuites

Ciphersuites with the NULL cipher (i.e., authentication-only) are not supported in TLS 1.3, so when they are specified in a priority string, TLS 1.3 is disabled.

Deprecated features

During the GnuTLS 3.6.x releases there was a significant effort to eliminate obscure options and code which could prove a liability in the future. As part of that, support for compression, OpenPGP authentication and SHA2-224 signatures in TLS key exchange was removed, along with their corresponding code. Furthermore, the DSA signature algorithm and the Camellia and 3DES ciphers were disabled by default for TLS sessions, while SHA1 was marked as insecure for certificate signatures.

What's next?

Although, a functional part of the TLS 1.3 protocol is implemented in GnuTLS, there are few items which are still missing and are marked with this label. Furthermore, we would like to extend our existing test suite for TLS 1.3 which consists of unit and interoperability tests, with the tlsfuzzer TLS 1.3 test suite once that is made available. The advantage of the tlsfuzzer test suite, is its design with an attacker mindset and helps uncover issues in corner and rare cases. The inclusion of this test suite will be a prerequisite for the 3.6.x branch being marked as the stable branch [update in 2019-5-25: this branch is already marked as stable and the tlsfuzzer test suite is incorporated].

Monday, August 21, 2017

An overview of GnuTLS 3.6.0

The new 3.6.0 GnuTLS release contains several new features, back-end changes and clean ups. This is a release which re-spins the so-called 'stable-next' branch, meaning that once considered stable enough, this branch will replace the current stable branch. The main target of this release was to have a library ready to incorporate new protocol additions such as TLS 1.3, which is currently in draft specification and is expected to be finalized in the coming months. That "preparation", spans from introducing new functionality needed for the new protocol features, improving the testing and fuzzying infrastructure of the library to reduce regressions and non-standards compliant behavior, to the removal of features and components which are no longer relevant in today's Internet Public Key Infrastructure.

In short, this release introduces a new lock-free random generator and adds new TLS extensions shared by both TLS 1.2 and 1.3, such as Finite Field Diffie Hellman negotiation, Ed25519 and RSA-PSS signatures. These additions modernize the current TLS 1.2 support and pave the way for TLS 1.3 support in the library. Furthermore, tlsfuzzer is introduced in our continuous integration test suite. Tlsfuzzer, is a meticulous TLS test suite, which tests the behavior of the implementation on various corner (and not) cases, and acts complementary to the internal GnuTLS test suite and its unit testing. This release, also eliminates a lot of legacy code, in order to reduce complexity and improve the manageability of the library, preventing legacy code to be used as a potential attack vector.

Further changes to support TLS 1.3 will be included on this release branch.

The following paragraphs go through the most significant changes of the 3.6.0 release.

Testing and Fuzzying

Fuzzying infrastructure

Fuzzying in the sense of trying arbitrary input to the library and testing its behavior under invalid and valid but rare inputs is not something new. However, in GnuTLS previously, fuzzying of various components of the library was done in a non-systematic way, usually by 3rd parties who then reported any issues found. That as you can imagine is an unreliable process. Without common fuzzying infrastructure, there is no fuzzying code or infrastructure re-use, forcing each and every person attempting to fuzz GnuTLS functionality to re-invent the wheel. Driven by the availability of Google's OSS-Fuzz project, and with the contributions of Alex Gaynor, Tim Ruehsen and yours truly, there is now a common fuzzying base testing several aspects of the library. That fuzzying test suite is run automatically under OSS-Fuzz, uncovering issues, and filling bugs against the library (and that's the part where automation stops).

tlsfuzzer

TLS fuzzer is TLS server implementation testing tool used at Red Hat for testing the quality of TLS implementations. It checks the behavior of an implementation on various corner (and not) cases, providing a tool not only for testing correctness of an implementation but for ensuring that no behavioral regressions are introduced undetected. With this release GnuTLS incorporates this testing tool on its continuous integration infrastructure, ensuring behavioral stability and thorough testing of existing and new functionality. Since the library is now modified for TLS 1.3, tlsfuzzer's use is invaluable as it allows detecting major or minor behavioral changes in the old protocol support, early.

CII best practices

The Core Infrastructure Initiative (CII) of Linux Foundation provides a best practices badge, as a way for Free software to demonstrate they follow best software engineering practices. The practices include change control, quality, static analysis, security, even bug reporting. Although several of these practices were already in use in the project, the process of going through that manual inspection of processes, uncovered several weaknesses and omissions which are now resolved. Taking the time to go through the manual inspection was the most difficult part, as there is always something more important to address; however, I believe that the time spent on it was worthwhile. My impression is that there has been quality work behind the formulation of these practices, and I'd recommend any free software project to seriously consider following them. You can view the result of GnuTLS' inspection here.

Random Number Generation

A new lock-free random generator

Versions of GnuTLS 3.3 and later rely on two random generators. The default is based on a combination of Salsa20/12 stream cipher for nonces, and Yarrow/AES for everything else. The other generator is the AES-CTR-DRBG, which is an AES-based deterministric random bit generator and is used optionally when the library is compiled with FIPS140-2 support and the system is in FIPS140-2 mode. Both of these generators operate under a global lock, making them a performance bottleneck for multi-threaded applications. Having such bottleneck in single-CPU systems or even 2-4 CPU systems in the past may have been an acceptable cost; today however as the number of CPUs in a single system increase past these numbers, such global locks severely harm performance. To address that, in GnuTLS 3.6.0 the random generator component was essentially re-written to address the bottleneck issue, simplify entropy gathering, as well as fix other issues found over the years. The end result, is a separate random generator per-thread, and there is a single default generator, based on the stream cipher CHACHA. The optional generator AES-CTR-DRBG remains the same. I'll not go further in the design of the new random generator, though you can find a detailed description of the changes on this post.

TLS Features

Finite Field Diffie-Hellman parameter negotiation (RFC7919)

If you have setup any TLS server, or have developed an application which uses TLS, most likely you would have seen references to Diffie-Hellman parameter files and their generation. In the early versions of GnuTLS I recommended generating them daily or weekly, provided certtool options to generate them by following specific security levels, and so on. The truth is that there were no available best practices that protocol designers, or implementers could refer to on how these parameters should be used correctly. As it always happens in these cases the burden is pushed to the application writers, and I guess the application writers push that further to the application users. Fortunately, with the publication of RFC7919, the DH parameter handling becomes the responsibility of the TLS protocol itself, and they are now negotiated without any input from the application (maybe except the desired security parameter/level). GnuTLS 3.6.0 implements that feature removing the need for server applications to specify Diffie-Hellman parameters, but in a backwards compatible way. Applications which already specify explicitly the DH parameters, will still function by overriding that negotiation.

In practice this feature introduces the notion of groups, that replace the previous notion of curves. Applications which were setting support for explicit curves via priority strings like "NORMAL:+CURVE-X25519", could now use "NORMAL:+GROUP-X25519" with identical functionality. The groups act as a superset of the curves, and contain the finite field groups for Diffie-Hellman, such as GROUP-FFDHE2048, GROUP-FFDHE3072, etc.

Digital signatures with Ed25519

Although curve x25519 was already supported for TLS ephemeral key exchange, there was no way to utilize certificates and private keys with the Ed25519 signature algorithm. This is now supported both for certificate signing and verification, as well as for TLS key exchange (following draft-ietf-tls-rfc4492bis-17). In contrast with RSA or even ECDSA, these keys offer an impressive performance, and are notoriously small, even in their PKCS#8 container. That is easily demonstrated with the certtool output below. The key consists of three lines, two of which are the PEM boilerplate.

certtool --generate-privkey --key-type ed25519
...
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIKVpRaegFppd3pDQ3wpHd4+wBV3gSjhKadEy8S1J4gEd
-----END PRIVATE KEY-----

Given the expected switch to post-quantum resistant algorithms in the not-so-far away future, that may be the last chance to utilize algorithms with such a small key size.

Digital signatures with RSA-PSS

That was the change that required by far the largest amount of code changes in GnuTLS 3.6.0; it required changes both in GnuTLS and nettle, so I'll dedicate few more lines to it. The feature was contributed mainly by Daiki Ueno, and was further extended by me. If you are not aware of the spicy details of cryptographic protocols today, RSA signing today is universally being used with a construction called PKCS#1 v1.5, as a tribute to the document that it was described at. Even though no attacks are known for the PKCS#1 v1.5 signing algorithm, the very similar PKCS#1 RSA decryption construction was successfully attacked in 1996 by Bleichenbacher, generating doubt on its cryptographic properties.

In order to prevent similar issues in the future, another RSA signing construction was defined in a later revision (v2) of the PKCS#1 document, named RSASSA-PSS (referred to as RSA-PSS from now on). That method, involves hash functions, and a salt, in order to provide a primitive with a security proof. The proof guarantees that the longer the salt, the stronger the security properties, with stronger being, unfortunately, undefined in any tangible terms. The best current practice followed by the PKCS#1 2.2 document is to tie the salt size with the size of the hash function involved, and thus associate directly the key security parameter with the hash function used in RSA-PSS signatures.

As mentioned above, RSA-PSS introduces optional parameters to a key or certificate. The parameters help the operator of the key (e.g., the software) sign using the desired security level. These parameters include the following information.

   RSASSA-PSS-params ::= SEQUENCE {
       hashAlgorithm      [0] HashAlgorithm,
       maskGenAlgorithm   [1] MaskGenAlgorithm,
       saltLength         [2] INTEGER,
       trailerField       [3] TrailerField
   }

That is, a key is associated with two different hashes (hashAlgorithm and maskGenAlgorithm), a variable salt size, and an unused for Internet PKI trailerField. To simplify generation and usage of such keys, GnuTLS 3.6.0 generates keys by default with no parameters, that is, keys that can be used with any hash or salt size (except SHA1 which is intentionally not supported). That form of keys is most suitable for servers which typically sign using any algorithm supported by the connected client. For CA keys and keys which require a consistent security level to be used, these parameters can be set, though GnuTLS will require the hash algorithms in hashAlgorithm and maskGenAlgorithm to match. Keys with non-matching algorithms, e.g., a key using SHA256 for hashAlgorithm and SHA512 for maskGenAlgorithm, are rejected as invalid.

To generate an RSA key for use only with RSA-PSS signatures, use the following command.

certtool --generate-privkey --key-type rsa-pss

To generate a key for RSA-PSS with a specific hash algorithm (the salt size will be obtained from it), use the following command:

certtool --generate-privkey --key-type rsa-pss --hash sha384

Note however, that very few applications accept keys intended only for RSA-PSS signatures. A more compatible approach is to generate an RSA key (which works for any purpose), and utilize that one to sign with the RSA-PSS signature algorithm. When that key is used in the context of TLS, it will be used for both RSA-PSS and plain PKCS#1 v1.5 signatures. As any cryptographer would tell you, that usage invalidates the RSA-PSS security proof, and underlines the need to utilize separate keys for the different algorithms.

As such, it is possible with GnuTLS to use separate keys for RSA PKCS#1 v1.5, and RSA-PSS, in order to reduce any risk due to the interaction between these two algorithms. When a GnuTLS server is provided with two keys, RSA and RSA-PSS, the latter will be used for RSA-PSS operations, and the former for the legacy PKCS#1 v1.5 operations.

Removed/Disabled functionality

3DES cipher is no longer enabled by default for TLS

Although the 3DES cipher is the mandatory option for TLS 1.0 and TLS 1.1, the cipher is unfortunately a relic of a different era. It is a 64-bit block cipher, which limits the amount of data ut can safely operate on, it is based on cipher with a 56-bit key size, and operates in encryption-decryption-encryption (EDE) mode to overcome the previous limitation. As such, that cipher provides a performance unacceptable for today and is only being used to interoperate with legacy hardware and software. As such, this cipher will no longer be enabled by default, but applications requiring should provide the end-user the necessary knobs to enable it (e.g., a priority string which includes "+3DES-CBC").

SHA1 is no longer acceptable for certificate signing

SHA1 used to be the de facto algorithm in X.509 certificates, or any other digital signature standards. Given the collision attacks on SHA1 and the fact that it has been phased out from the public web, GnuTLS will not accept SHA1 signatures on certificates as trusted by default. SHA1 will remain acceptable for other types of signatures as it is still widely used. Note, however, that the existing collision attacks do not translate directly to an attack on digital signatures with SHA1. The removal is a precaution and preparation for its complete phasing out. The reason is, that even though direct attacks are not applicable on SHA1-based digital signatures, the experience with the attacks on MD5 the previous decade, shows that there can be clever ways to take advantage of collision attacks in order to forge certificates.

OpenPGP functionality was removed

When I originally started working on GnuTLS I was envisioning a future where OpenPGP certificates will be used instead of X.509. My driver was the perceived simplicity of OpenPGP certificate format, and the fact that obtaining a certificate at the time required the intervention of costly CA, in contrast with OpenPGP where one had to generate a key and manage its web of trust. That never took off as a deployment nor idea, and today none of the original driving factors are valid. OpenPGP certificates are not significantly simpler to parse than X.509, the web-of-trust proved to be a more difficult problem than Internet PKI, and the costly CAs verification issue is no longer relevant after letsencrypt.org.

IDNA2003 is no longer supported

IETF has switched to IDNA2008 for internationalized domain names since long time and as such GnuTLS will no longer provide compatibility code for the older standard. Internationalized domain names may not be widely known in the english speaking world, however, their use varies around the world. Hence, supporting them is necessary in order to be able to properly handle of PKIX (X.509) certificates and verification, with internationalized domain names. See my previous post for a more detailed description of IDNA today.

TLS compression functionality was removed

Compression prior to encryption was always considered a good thing, not because it eliminates correlations in plaintext due to language or file format in use, but also because it reduces the size of the transmitted data, and the latter is a significant performance benefit in restricted by bandwidth lines. Why did we remove it then? The reason is that after compression the ciphertext length, which in TLS 1.2 is in clear, may reveal more information about the data, and that, becomes a confidentiality breach issue when data are partially under the control of the attacker. This property has been exploited in attacks like the fancy-named CRIME attack.

Given the above, the currently held belief in protocol design is to delegate compression to application protocols, e.g., TLS 1.3 will not include support for compression, and for that we believe that there are more benefits in removing that feature completely, reducing the attack surface of the library, rather than keeping it as a legacy feature.

Concluding remarks

I'd like to sincerely thank everyone who has contributed for the GnuTLS 3.6.0 release to be possible. The git shortlog follows; happy hacking!

Alex Gaynor (12):
      Migrated fuzzers from the oss-repo to here.
      Added a server fuzzer
      Move to the devel dir
      Describe the integration
      Added a parser for PKCS7 importing and printing
      Added a fuzzer for OpenPGP cert parsing
      Do not infinite loop if an EOF occurs while skipping a PGP packet
      Attempt to fix a leak in OpenPGP cert parsing.
      Corrected a leak in OpenPGP sub-packet parsing.
      Enforce the max packet length for OpenPGP subpackets as well
      Do not attempt to parse a 32-bit integer if a packet is not 4 bytes.
      Do not attempt to parse a 32-bit integer if a packet is not 4 bytes.

Alexander Kanavin (1):
      Do not add cli-args.h to cli-args.stamp Makefile target

Alon Bar-Lev (19):
      tests: suite: pkcs11: skip if no softhsm
      tests: cert-tests: pkcs12 drop builddir usage
      tests: skip tests that requires tools if tools are disabled
      gitignore: sort()
      gitignore: update [ci skip]
      tests: skip tests that requires tools if tools are disabled
      tests: suite: chain: support separate builddir
      tests: remove bash usage
      tests: skip tests that requires tools if tools are disabled
      configure: remove void statement
      valgrind: support separate builddir for suppressions.valgrind
      .gitlab-ci.yml: add Fedora/x86_64/no-tools
      build: doc: install images also into htmldir
      tests: scripts: suppress which errors
      tests: remove unused suppressions.valgrind
      tests: suppressions.valgrind: supress fillin_rpath
      tests: cert-tests: openpgp-certs: align test redirection
      build: tests: resolve as-needed issue with seccomp
      build: disable valgrind tests by default

Andreas Metzler (3):
      Use NORMAL priority for SSLv23_*_method.
      gnutls-cli: Use CRLF with --starttls-proto=smtp.
      Fix autoconf progress message concerning heartbeat [ci skip]

Daiki Ueno (3):
      build: import files from Nettle for RSA-PSS
      x509: implement RSA-PSS signature scheme
      nettle: ported fix for assertion failure in pss_verify_mgf1

Daniel Kahn Gillmor (1):
      clarify documentation and arguments for psktool

David Caldwell (2):
      Rename uint64 to gnutls_uint64 to avoid conflict with macOS
      gnutls_x509_trust_list_add_system_trust: Add macOS keychain support

Dmitry Eremin-Solenikov (13):
      configure.ac: remove autogen'erated files only if necessary
      Add special MD5+SHA1 digest to simplify TLS signature code
      Rewrite SSL/TLS signing code to use combined MD5+SHA1 digest
      Rewrite SSL/TLS signature verification to use combined MD5+SHA1 digest
      Use MAC_MD5_SHA1 instead of MAC_UNKNOWN to specify TLS 1.0 PRF
      Cache MAC algorithm used for PRF function
      Rework setting next cipher suite
      Rework setting next compression method
      Drop _gnutls_epoch_get_compression
      Don't let GnuTLS headers in NETTLE_CFLAGS override local headers
      Fix two memory leaks in debug output of gnutls tools
      gnutls-serv: allow user to specify multiple x509certile/x509keyfile
      Rework KX -> PK mappings

Karl Tarbe (2):
      certtool: allow multiple certificates in --p7-sign
      tests: add test for signing with certificate list

Marcin Cieślak (1):
       only if HAVE_ALLOCA_H

Martin Storsjo (2):
      Fix a typo in a variable name in an m4 script
      Avoid deprecation warnings when including gnutls/abstract.h

Matt Turner (1):
      tests: Copy template out of ${srcdir}

Nicolas Dufresne (1):
      rsa-psk: Use the correct username datum

Nikos Mavrogiannopoulos (1148):
      ...

Rical Jasan (1):
      tests: Improve port-checking infrastructure.

Robert Scheck (1):
      Add LMTP, POP3, NNTP, Sieve and PostgreSQL support to gnutls-cli

Tim Rühsen (11):
      Add support for libidn2 (IDNA 2008 + TR46)
      lib/system/fastopen: Add TCP Fast Open for OSX
      Fix memleak in gnutls_x509_crl_list_import()
      Fix memleaks in gnutls_x509_trust_list_add_crls()
      fuzzer: Initial check in for improved fuzzing
      fuzzer: Suppress unsigned integer overflow in rnd-fuzzer.c
      fuzzer: Suppress leak in libgmp <= 6.1.2
      fuzzer: Move regression corpora from tests/ to fuzz/
      fuzzer: Add 'make -C fuzz coverage' [ci skip]
      fuzzer: Fix include path in run-clang.sh [skip ci]
      fuzzer: Update base64 fuzzers + corpora

Monday, April 3, 2017

The mess with internationalized domain names

While internationalized domain names (DNS names) are not common in the English speaking world, they exist and their use was standardized by IETF's IDNA standards. I first found out the existence of that possibility while reading the IETF's best practices for domain name verification. As english is not my mother tongue I was particularly interested on the topic, and wanted to make sure that GnuTLS would handle such domains correctly both for storing such domains, and verifying them. That proved not to be an easy task. The following text summarizes my brief understanding of the issues in the field (disclaimer: I am far from an expert in software internationalization topics).

How does IDNA work?

To make a long story short, the IDNA protocols are based on a simple principle. They translate domain names typed with unicode characters (UTF-8 or otherwise), to a US-ASCII (English text) representation which becomes the actual domain name. For example the greek name "ένα.gr" would translate to "xn--ixai9a.gr". On Linux systems one can find Simon Josefsson's idn and idn2 tools (more on that below), which can be used to translate from an internationalized string to IDNA format. For example:

    $ echo "ενα.gr"|idn
    xn--mxahy.gr

What are the issues with IDNA?

Although there are simple to use libraries (see Libidn) to access IDNA functionality, there is a catch. In 2010, IETF updated the IDNA standards with a new set of standards called IDNA2008, which were "mostly compatible" with the original standard (called IDNA2003). Mostly compatible meant that the majority of strings mapped to the same US-ASCII equivalent, though some didn't. They mapped to a different string. That affected many languages, including the Greek language mappins, and the following table displays the IDNA2003 and IDNA2008 mappings of few "problematic" Greek domain names.

non-English string	IDNA2003	IDNA2008
νίκος.gr	xn--kxawhku.gr	xn--kxawhkp.gr
νίκοσ.gr	xn--kxawhku.gr	xn--kxawhku.gr
NΊΚΟΣ.gr	xn--kxawhku.gr	(undefined)

In the above table, we can see the differences in mappings for three strings. All of the above strings can be considered to be equal in the greek language, as the third is the capitalized version of the first, and the second is the "dumb" lower case equivalent of the last.

The problematic character is 'σ' which in Modern Greek is switched with 'ς' when it is present at the end of word. As both characters are considered to be identical in the language, they are both capitalized to the same character 'Σ' (Sigma).

There are two changes in IDNA2008 standard which affect the examples above. The first, is the treatment of the 'ς' and 'σ' characters as different, causing the discrepancy between the mappings in the first and second examples. The second is that IDNA2008 is defined only for a specific set of characters, and there is no pre-processing phase, which causes the undefined state of the third string, that contains capital letters. These changes, create a discrepancy between expectations formed by observing the behavior of domains consisting of US-ASCII strings and the actual reality with Internationalized scripts. Similar cases exist in other languages (e.g., with the treatment of the 'ß' character in German).

Even though some work-arounds of the protocol may seem obvious or intuitive to implement, such as lower-casing characters prior to converting to IDNA format, lower-casing doesn't make sense in all languages. This is the reason that the capitalized version (NΊΚΟΣ.gr) of the first string on the table, is undefined in IDNA2008.

You can verify the mappings I presented above with the idn2 application, which is IDNA2008-compliant. For example:

    echo "NΊΚΟΣ.gr"|idn2
    idn2: lookup: string contains a disallowed character

Is there any solution?

To address these issues, a different standards body --the Unicode consortium-- addressed the issue with the Unicode Technical Standard #46 (UTS#46 or TR#46). That standard was published in 2016 to clarify few aspects of IDNA2008 and propose a compatible with IDNA2003 behavior.

UTS#46 proposes two modes of IDNA2008, the transitional, which results to problematic characters being mapped to their IDNA2003 equivalents and the non-transitional mode, which is identical to the original IDNA2008 standard. In addition it requires the internationalized input to be pre-processed with the CaseFold algorithm which allows handling upper-case domain names such as "ΝΊΚΟΣ.gr" under IDNA2008.

Switching to IDNA2008

Unfortunately even with UTS#46, we are left with two IDNA2008 variants. The transitional which is IDNA2003 compatible and the non-transitional which is IDNA2008 incompatible. Some NICs and registrars have already switched to IDNA2008 non-transitional, but not all software has followed up.

A problem is, that UTS#46 does not define a period for the use of transitional encodings, something that makes their intended use questionable. Nevertheless, as the end-goal is to switch to the non-transitional IDNA2008, it still makes it practical to switch to it, by clarifying several undefined parts of the original protocol (e.g., adds a pre-processing phase). As a result, few browsers (e.g., Firefox) have already switched to it. It is also possible for software based on libidn, which only supports IDNA2003, to switch. The libidn2 2.0.0 release includes a libidn compatible APIs making it possible to switch to IDNA2008 (transitional or not).

Should I do the switch?

There are few important aspects of the IDNA2008 (non-transitional) domain names, which have to be taken into account prior to switching. As we saw above, the semantics of entering a domain in upper case, and expecting it to be translated to the proper web-site address wouldn't work for internationalized domain names. If one enters the domain "ΝΊΚΟΣ.gr", it would translate to the domain xn--kxawhku.gr (i.e., "νίκοσ.gr"), which is a misspelled version of the correct in Greek language "νίκος.gr".

Moreover, as few software has switched to IDNA2008 non-transitional processing of domain names, there is always the discrepancy between the IDNA2003 mapping and the IDNA2008 mapping. That is, a domain owner would have to be prepared to register both the IDNA2003 version of the name and the IDNA2008 version of it, to ensure all users are properly redirected to his intended site. This is apparent on the following real domains.

http://fass.de
http://faß.de

If you are a German speaker you most likely consider them equivalent, as the 'ß' character is often expanded to 'ss'. That is how IDNA2003 treated that character, however, that's not how IDNA2008 treats it. If you use the Chrome browser which at the moment uses IDNA2003 (or more precisely IDNA2008 transitional), both of these URIs you will be re-directed to the same web-site, fass.de. However, if you use Firefox, which uses IDNA2008, you will be re-directed to two different web sites. The first being the fass.de and the second the xn--fa-hia.de.

That discrepancy was treated as a security issue by the curl and wget projects and was assigned CVE-2016-8625. Both projects switched to non-transitional IDNA2008.

What about certificates, can they address the issue above?

Unfortunately the above situation, cannot be fixed with X.509 certificates and in fact such a situation undermines the trust in them. The operation of X.509 certificates for web site authentication, is based on the uniqueness of domain names. In english language we can be sure that a domain name, whether entered in upper or lower case will be mapped to unique web-site. With internationalized names that's no longer the case.

What is unique in internationalized names is the final output domain, e.g., xn--kxawhku.gr, which for authentication purposes is meaningless as it is, so we have to rely on software to do the reverse mapping for us, on the right place. If the software we use uses different mapping rules than the rules applied by the registrar of the domain, users are left helpless as in the case above.

What to do now?

Although at this point, we know that IDNA2008 has quite some peculiarities which will be problematic in the future, we have no better option available. IDNA2003 cannot support new unicode standards and is already obsolete, so biting the bullet, and moving to non-transitional IDNA2008 seems like the right way to go. It is better to have a single and a little problematic standard, rather than have two active standards for domain name mapping.

Tuesday, March 21, 2017

Improving by simplifying the GnuTLS PRNG

One of the most unwanted baggages for crypto implementations written prior to this decade is the (pseudo-)random generator, or simply PRNG. Speaking for GnuTLS, the random generator was written at a time where devices like /dev/urandom did not come by default on widely used operating systems, and even if they did, they were not universally available, e.g., devices would not be present, the Entropy Gathering Daemon (EGD) was something that was actually used in practice, and was common for software libraries like libgcrypt to include code to gather entropy on a system by running arbitrary command line tools.

That resulted in an internal random generator which had to rely on whatever was provided by the operating system and the administrator, and that, in several cases was insufficient to seed a cryptographic PRNG. As such, an advanced PRNG was selected, based on Yarrow, which kept a global per-process state, and was aggressively gathering information, including high precision timestamps and process/thread statistics, to enhance a potentially untrusted pool formed from the system random generator or EGD. That, also meant locks for multi-threaded processes to access the global state, and thus a performance bottleneck, since a call to the PRNG is required even for the simplest of crypto operations.

Today, however, things have changed in operating systems. While Linux used to be a pioneer with /dev/urandom, now all operating systems provide a reliable PRNG, even though there are still no standardized APIs.

Linux provides /dev/urandom, getrandom(), getentropy()
Windows provides CryptGenRandom()
*BSD provides /dev/urandom, getentropy()
MacOSX provides /dev/urandom, getentropy()
Solaris: /dev/urandom, getentropy(), getrandom().

On the list above, I ignore the /dev/random interface which has concerning properties, such as indefinite response time (see my previous post for limitations on the Linux interfaces).

Some of the interfaces above are provided as system calls, some others as libc calls, and others as file system devices, but for the application writer, that shouldn't make significant difference. These devices or system calls, provide access to a system PRNG, which is in short doing what was GnuTLS doing manually previously, mixing various inputs from the system, in a level and way that a userspace library like GnuTLS could never do, as the kernel has direct access to available hardware and interrupts.

Given the above, a question that I've been asking myself lately, is whether there is any reason to continue shipping something advanced such as a Yarrow-based PRNG in GnuTLS? Why not switch to simple PRNG, seeded only by the system device? That would not only provide simplicity in the implementation, but also reduce the performance and memory cost of complex constructions like Yarrow. In turn, switching to something simple with low memory requirements would allow having a separate PRNG per-thread, further eliminating the bottleneck of a global per-process PRNG.

The current PRNG

To provide some context on GnuTLS' PRNG, it is made available through the following function all:

int gnutls_rnd(gnutls_rnd_level_t level, void *data, size_t len);

That takes as input an indicative level, which can be NONCE for generating nonces, RANDOM for session keys, or KEY for long term keys. The function outputs random data in the provided buffer.

There was (a partial) attempt in GnuTLS 3.3.0 to improve performance, by introducing a Salsa20-based PRNG for generating nonces, while keeping Yarrow for generating keys. This change, although it provided the expected performance improvement for the generation of nonces, it still kept global state, and thus still imposed a bottleneck for multi-threaded processes. At the same time, it offered no improvement on the memory consumption (in fact it was increased slightly by a Salsa20 instance - around 64 bytes).

For the yet-unreleased 3.6.0, we took that enhancement several steps further, ensuing the elimination of the locking bottleneck for multi-threaded processes. It was a result of a relatively large patch set, improving the state of the internal PRNG, and rewriting it, to the following layout.

The new PRNG

The Yarrow and Salsa20 PRNGs were replaced by two independent PRNGs based on the CHACHA stream cipher. One PRNG is intended to be used for the NONCE level (which we'll refer to it as the nonce PRNG) and the other for KEY and RANDOM levels (the key PRNG). That reduces the memory requirements by eliminating the heavyweight Yarrow, and at the same time allows better use of the CPU caches, by employing a cipher that is potentially utilized by the TLS protocol, due to the CHACHA-POLY1305 ciphersuite.

To make the state lock-free, these two generators keep their state per thread by taking advantage of thread local data. That imposes a small memory penalty per-thread --two instances of CHACHA occupy roughly 128-bytes--, albeit, it eliminates the bottleneck of locks to access the random generator in a process.

Seeding the PRNG

The PRNGs used by GnuTLS are created and seeded on the first call to gnutls_rnd(). This behavior is a side-effect of a fix for getrandom() blocking in early boot in Linux, but it fits well with the new PRNG design. Only threads which utilize the PRNG calls will allocate memory for it, and carry out any seeding.

For threads that utilize the generator, the initial seeding involves calling the system PRNG, i.e., getrandom() in Linux kernel, to initialize the CHACHA instances. The PRNG is later re-seeded; the time of the re-seed depends both on time elapsed and the amount of bytes generated. At the moment of writing, the nonce PRNG will be re-seeded when 16MB of is generated, or 4 hours of operation, whichever is first. The key PRNG will re-seed using the operating system's PRNG, after 2MB of data are generated, or after 2 hours of operation.

As a side note, that re-seed based on time was initially a major concern of mine, as it was crucial for a call to random generator to be efficient, without utilizing system calls, i.e., imposing a switch to kernel mode. However, in glibc calls like time() and gettimeofday() are implemented with vdso something that transforms a system call like time(), to a memory access, hence do not introduce any significant performance penalty.

The data limits imposed to PRNG outputs are not entirely arbitrary. They allow several thousands of TLS sessions, prior to re-seeding, to avoid re-introducing a bottleneck on busy servers, this time being the system calls to operating system's PRNG.

Defense against common PRNG attacks

There are multiple attacks against a PRNG, which typically require a powerful adversary with access to the process state (i.e., memory). There are also attacks on which the adversary controls part of the input/seed to PRNG, but we axiomatically assume a trusted Operating System, trusted not only in the sense of not being backdoored, but also in the sense of doing its PRNG job well.

I'll not go through all the details of attacks (see here for a more detailed description), but the most prominent of these attacks and applicable to our PRNG are state-compromise attacks. That is, the attacker obtains somehow the state of the PRNG --think of a heartbleed-type of attack which results to the PRNG state being exposed--, and uses that exposed state to figure out past, and predict future outputs.

Given the amount of damage a heartbleed-type of attack can do, protecting against the PRNG state compromise attacks remind this pertinent XKCD strip. Nevertheless, there is merit to protecting against these attacks, as it is no longer unimaginable to have scenarios where the memory of the PRNG is exposed.

Preventing backtracking

This attack assumes that the attacker obtained access to the PRNG state at a given time, and would need to recover a number of bytes generated in the past. In this construct, both the nonce and key PRNGs re-seed based on time, and data, after which recovery is not possible. As such an attacker is constrained to access data within the time or data window of the applicable generator.

Furthermore, generation of long-term keys (that is, the generator under the KEY level), ensures that such backtracking is impossible. That is, in addition to any re-seed previously described, the key generator will re-key itself with a fresh key generated from its own stream after each operation.

Preventing permanent compromise

That, is in a way the opposite of the previous attack. The attacker, still obtains access to the PRNG state at a given time, and would like to recover to recover all data generated in the future. In a design like this, we would like to limit the number of future bytes that can be recovered.

Again, the time and data windows of the PRNGs restrict the adversary's access within them. An attacker will have to obtain constant or periodic access to the PRNG state, to be able to efficiently attack the system.

Final remarks

The design of the new GnuTLS PRNG is quite similar to the arc4random implementation on the OpenBSD system. The latter despite its name, is also based on the CHACHA cipher. Few details differ, however. The GnuTLS PRNG enforces a refresh of the PRNG based on elapsed time, in addition to output data, does re-key only for when a requests for data at the KEY level, and strives for low memory footprint as it utilizes a separate generator per process thread.

Another thing to note, is that the fact that the gnutls_rnd() call allows for an advisory level to be specified, provides the internal implementation quite some flexibility. That is, the given level, although advisory, allows for optimizations to be enabled for levels that are not intended for secrecy. That is, apply different data and time limits on nonce and key generator, and thus increasing performance when possible. The cost of such a compromise for performance, is a larger window of exposure when the PRNG's state is compromised.

The generator described, will be made available in the next major release of GnuTLS, although the details may change.

Sunday, November 13, 2016

Using the Nitrokey HSM with GnuTLS applications

The Nitrokey HSM is an open hardware security module, in the form of a smart card token, which is used to isolate a server's private key from the application. That is, if you have an HTTPS server, such a hardware security module will prevent an attacker which temporarily obtained privileged access on the server (e.g., via an exploit like heartbleed), from copying the server's private key, allowing for impersonating it. See my previous post for a more elaborate discussion on that defense mechanism.

The rest of this post will explain how you can initialize this token and utilize it from GnuTLS applications, and in the process explain more about smart card and HSM usage in applications. For the official (and more advanced) Nitrokey setup instructions and tips you can see this OpenSC page, another interesting guide is here.

HSMs and smart cards

Nitrokey HSM is something between a smart card and an HSM. However, there is no real distinction between smart cards and Hardware Security Module from a software perspective. Hardware-wise one expects better (in terms of cost to defeat) tamper-resistance on HSMs, and at the same time sufficient performance for server loads. An HSM module is typically installed on PCI slots, USB, while smart cards are mainly USB or via a card reader.

On the software-side both smart cards and HSMs are accessed the same way, over the PKCS#11 API. That is an API which abstracts keys from operations, i.e., the API doesn't require direct access to the private key data to complete the operation. Most crypto libraries today support this API directly as GnuTLS and NSS do, or via an external module like OpenSSL (i.e., via engine_pkcs11).

Each HSM or smart card, comes with a "driver", i.e., a PKCS#11 module, which one had to specify on legacy applications. On modern systems, which have p11-kit, the available drivers are registered with p11-kit and applications can obtain and utilize them on run-time (see below for more information). For Nitrokey the OpenSC driver is being used, a driver for almost every other smart card that is supported on Linux.

If you are familiar with old applications, you would have noticed that objects were referred to as "slot1_1", which meant the first object on the first slot of the driver, or "1:1", and several other obscure methods depending on the application. The "slots" notion is an internal to PKCS#11, which is inherently unstable (re-inserting may change the slot number assignment), thus these methods to refer to objects cannot accommodate easily for multiple cards, or for referring to an object within a specific card if multiple are present, nor to easily utilize cards which are under the different drivers. More recent applications support PKCS#11 URIs, a method to identify tokens, and objects within the token which is unique system-wide; the URI looks like:

pkcs11:token=SmartCard-HSM;object=my-ecc-key;type=private

For GnuTLS applications, only PKCS#11 URIs can be used to refer to objects.

Driver setup and token discovery

On a typical Linux system which runs the pcscd server, and has opensc and p11-kit properly installed the following command should list the nitrokey token once inserted.

    $ p11tool --list-tokens

One of the entries printed should be something like the following.

Token 5:
    URL: pkcs11:model=PKCS%2315%20emulated;manufacturer=www.CardContact.de;serial=DENK0100424;token=SmartCard-HSM20%28UserPIN%29
    Type: Hardware token
    Manufacturer: www.CardContact.de
    Model: PKCS#15 emulated
    Serial: DENK0100424
    Module: /usr/lib64/pkcs11/pkcs11/opensc-pkcs11.so

The above information contains the identifying PKCS#11 URI of the token as well as information about the manufacturer and the driver library used. The PKCS#11 URI is a standardized unique identifier of tokens and objects stored within a token. If you do not see that information, verify that you have all of pcsc-lite, pcsc-lite-ccid, opensc, gnutls and p11-kit installed. If that's the case, you will need to register the opensc token to make it known to p11-kit manually (modern distributions take care of this step). This can be done with the following commands as administrator.

    # mkdir -p /etc/pkcs11/modules
    # echo "module: /usr/lib64/pkcs11/opensc-pkcs11.so" >/etc/pkcs11/modules/opensc.conf

It is implied that the your system's libdir for PKCS#11 drivers should be used instead of the "/usr/lib64/pkcs11" path used above. Alternatively, one could append the --provider parameter on the p11tool command, to explicitly specify the driver, as in the following example. For the rest of this text we assume a properly configured p11-kit and omit the --provider parameter.

    $ p11tool --provider /usr/lib64/pkcs11/opensc-pkcs11.so --list-tokens

Token initialization

An HSM token prior to usage needs to be initialized, and be provided two PINs. One PIN is for operations requiring administrative (security officer in PKCS#11 jargon) access, and the second (the user PIN ) is for normal token usage. To initialize use the following command, with the PKCS#11 URL listed by the 'p11tool --list-tokens' command; in the following text we will use $URL to refer to that.

    $ p11tool --initialize "$URL"

Alternatively, when the driver supplied supports a single card, the URL can be specified as "pkcs11:" as shown below.

    $ p11tool --provider  /usr/lib64/pkcs11/opensc-pkcs11.so --initialize "pkcs11:"

The initialization commands above will ask to setup the security officer's PIN, which for nitrokey HSM is by default "3537363231383830". At the initialization process, the user PIN will also be asked. The user PIN is PIN which must be provided by applications and users, in order to use the card. Note that the command above (prior to GnuTLS 3.5.6) will ask for the administrator's PIN twice, once for initialization and once for setting the user PIN.

Key and certificate generation

It is possible to either copy an existing key on the card, or generate a key in it, a key which cannot be extracted. To generate an elliptic curve (ECDSA) key use the following command.

    $ p11tool --label "my-key" --login --generate-ecc "pkcs11:token=SmartCard-HSM20%28UserPIN%29"

The above command will generate an ECDSA key which will be identified by the name set by the label. That key can be then by fully identified by the PKCS#11 URL "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private". If the command was successful, the command above will list two objects, the private key and the public key.

    $ p11tool --login --list-all "pkcs11:token=SmartCard-HSM20%28UserPIN%29"

Note that both objects share the same ID but have different type. As this key cannot be extracted from the token, we need to utilize the following commands to generate a Certificate Signing Request (CSR).

    $ certtool --generate-request --load-privkey "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private" --outfile cert.csr

After providing the required information to certtool, it will generate a certificate request on cert.csr file. Alternatively, to generate a self-signed certificate, one can replace the '--generate-request' parameter with the '--generate-self-signed'.

The above generated certificate signining request, will allow to get a real certificate to use for the key stored in the token. That can be generated either with letsencrypt or a local PKI. As the details vary, I'm skipping this step, and I'm assuming a certificate is generated somehow.

After the certificate is made available, one can write it in the token. That step is not strictly required, but in several scenarios it simplifies key/cert management by storing them at the same token. One can store the certificate, using the following command.

    $ p11tool --login --write --load-certificate cert.pem --label my-cert --id "PUBKEY-ID" "pkcs11:token=SmartCard-HSM20%28UserPIN%29"

Note that specifying the PUBKEY-ID is not required, but it is generally recommended for certificate objects to match the ID of the public key object listed previously with the --list-all command. If the IDs do not match some (non-GnuTLS) applications may fail to utilize the key. The certificate stored in the token will have the PKCS#11 URL "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert".

Testing the generated keys

Now that both the key and the certificate are present in the token, one can utilize their PKCS#11 URL in any GnuTLS application in place of filenames. That is if the application is asking for a certificate file, enter "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert", and for private key "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private".

The following example will run a test HTTPS server using the keys above.

    $ gnutls-serv --port 4443 --http --x509certfile "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-cert;type=cert" --x509keyfile "pkcs11:token=SmartCard-HSM20%28UserPIN%29;object=my-key;type=private;pin-value=1234"

That will setup a server which answers on port 4443 and will utilize the certificate and key on the token to perform TLS authentication. Note that the command above, demonstrates the use of the "pin-value" URI element. That element, specifies the object PIN on command line allowing for non-interactive token access.

Applicability and performance

While the performance of this HSM will most likely not allow you to utilize it in busy servers, it may be a sufficient solution for a private server, VPN, a testing environment or demo. On client side, it can certainly provide a sufficient solution to protect the client assigned private keys. The advantage a smart card provides to OTP, is the fact that it is simpler to provision remotely, with the certificate request method shown above. That can be automated, at least in theory, when a protocol implementation of SCEP is around. In practice, SCEP is well established in the proprietary world, but it is hard to find free software applications taking advantage of it.

Converting your application to use PKCS#11

A typical application written to use GnuTLS as TLS back-end library should be able to use smart cards and HSM tokens out of the box. The only requirement is for the applications to use the high-level file loading functions, which can load files or PKCS#11 URIs when provided. The only new requirement is for the application to obtain the PIN required for accessing the token, that can be done interactively using the PIN callbacks, or via the PKCS#11 URI "pin-value" element. For source examples, I'll refer you to GnuTLS documentation.

Some indicative applications which I'm aware they can use tokens via PKCS#11 URIs transparently, and can be used for testing, are mod_gnutls, lighttpd2, and openconnect.

Tuesday, October 25, 2016

A brief look at the Linux-kernel random generator interfaces

Most modern operating systems provide a cryptographic pseudo-random number generator (CPRNG), as part of their OS kernel, intended to be used by applications involving cryptographic operations. Linux is no exception in that, and in fact it was the first operating system that actually introduced a CPRNG into the kernel. However, there is much mystery around these interfaces. The manual page is quite unclear on its suggestions, while there is a web-site dedicated to debunking myths about these interfaces, which on a first read contradicts the manual page.

In this post, triggered by my recent attempt to understand the situation and update the Linux manual page, I'll make a brief overview of these interfaces. Note that, this post will not get into the insights of a cryptographic pseudo-random generator (CPRNG); for that, consider reading this article. I will go through these interfaces, intentionally staying on the high-level, without considering internal details, and discuss their usefulness for an application or library that requires access to such a CPRNG.

/dev/random: a file which if read from, will output data from the kernel CPRNG. Reading from this file blocks once the kernel (using some a little arbitrary metric) believes not enough random events have been accumulated since the last use (I know that this is not entirely accurate, but the description is sufficient for this post).
/dev/urandom: a file which if read from, will provide data from the kernel CPRNG. Reading from /dev/urandom will never block.
getrandom(): A system call which provides random data from the kernel CPRNG. It will block only when the CPRNG is not yet initialized.

A software engineer who would like to seed a PRNG or generate random encryption keys, and reads the manual page random(4) carefully, he will most likely be tempted to use /dev/random, as it is described as "suitable for uses that need very high quality randomness such as ... key generation". In practice /dev/random cannot be relied on, because it requires large amounts of random events to be accumulated in order to provide few bytes of random data to running processes. Using it for key generation (e.g, for ssh keys during first boot) is most likely going to convert the first boot process to a coin flip; heads and system is up, tails and the system is left hanging waiting for random events. This (old) issue with a mail service process hanging for more than 20 minutes prior to doing any action, illustrates the impact of this device to real-world applications which need to generate fresh keys on startup.

On the other hand, the device /dev/urandom provides access to the same random generator, but will never block, nor apply any restrictions to the amount of new random events that must be read in order to provide any output. That is quite natural given that modern random generators when initially seeded can provide enormous amounts of output prior to being considered broken (in an informational-theory sense). So should we use only /dev/urandom today?

There is a catch. Unfortunately /dev/urandom has a quite serious flaw. If used early on the boot process when the random number generator of the kernel is not fully initialized, it will still output data. How random are the output data is system-specific, and in modern platforms, which provide specialized CPU instructions to provide random data, that is less of an issue. However, the situation where ssh keys are generated prior to the kernel pool being initialized, can be observed in virtual machines which have not been given access to the host's random generator.

Another, though not as significant, issue is the fact that both of these interfaces require a file descriptor to operate. That, on a first view, may not seem like a flaw. In that case consider the following scenarios:

The application calls chroot() prior to initializing the crypto library; the chroot environment doesn't contain any of /dev/*random.
To avoid the issue above, the crypto library opens /dev/urandom on an library constructor and stores the descriptor for later use. The application closes all open file descriptors on startup.

Both are real-world scenarios observed over the years of developing the GnuTLS library. The latter scenario is of particular concern since, if the application opens few files, the crypto library may never realize that the /dev/urandom file descriptor has been closed and replaced by another file. That may result to reading from an arbitrary file to obtain randomness. Even though one can introduce checks to detect such case, that is a particularly hard issue to spot, and requires inefficient and complex code to address.

That's where the system call getrandom() fits. Its operation is very similar to /dev/urandom, that is, it provides non-blocking access to kernel CPRNG. In addition, it requires no file descriptor, and will also block prior to the kernel random generator being initialized. Given that it addresses, the issues of /dev/urandom identified above, that seems indeed like the interface that should be used by modern libraries and applications. In fact, if you use new versions of libgcrypt and GnuTLS today, they take advantage of this API (though that change wasn't exactly a walk in the park).

On the other hand, getrandom() is still a low-level interface, and may not be suitable to be used directly by applications expecting a safe high-level interface. If one carefully reads its manual page, he will notice that the API may return less data than the requested (if interrupted by signal), and today this system call is not even wrapped by glibc. That means that can be used only via the syscall() interface. An illustration of (safe) usage of this system call, is given below.

#include <sys/syscall.h>
#include <errno.h>
#define getrandom(dst,s,flags) syscall(SYS_getrandom, (void*)dst, (size_t)s, (unsigned int)flags)

static int safe_getrandom(void *buf, size_t buflen, unsigned int flags)
{
  ssize_t left = buflen;
  ssize_t ret;
  uint8_t *p = buf;
  while (left > 0) {
   ret = getrandom(p, left, flags);
   if (ret == -1) {
    if (errno != EINTR)
     return ret;
   }
   if (ret > 0) {
    left -= ret;
    p += ret;
   }
  }
  return buflen;
}

The previous example code assumes that the Linux kernel supports this system call. For portable code which may run on kernels without it, a fallback to /dev/urandom should also be included.

From the above, it is apparent that using the Linux-kernel provided interfaces to access the kernel CPRNG, is not easy. The old (/dev/*random) interfaces APIs are difficult to use correctly, and while the getrandom() call eliminates several of their issues, it is not straightforward to use, and is not available in Linux kernels prior to 3.17. Hence, if applications require access to a CPRNG, my recommendation would be to avoid using the kernel interfaces directly, and use any APIs provided by their crypto library of choice. That way the complexity of system-discovery and any other peculiarities of these interfaces will be hidden. Some hints and tips are shown in the Fedora defensive coding guide (which may be a bit out-of-date but still a good source of information).

Thursday, June 2, 2016

Restricting the scope of CA certificates

The granting of an intermediate CA certificate to a surveillance firm generated quite some fuss. Setting theories aside, the main reason behind that outcry, is the fact that any intermediate CA certificate trusted by the browsers has unlimited powers to certify any web site on the Internet. Servers can protect themselves against an arbitrary CA generating a valid certificate for their web site, using certificate pinning, but there is very little end-users can do. In practice, end-users either trust the whole bundled CA list in their browser/system or not.

An option for end-users is to utilize trust on first use, but that is not a widespread practice, and few software, besides for SSH, support it. A way for me as a user to defend against a believed to be rogue CA, is by disabling or removing that CA from my trusted bundle. But what if I trust that CA for a particular web site or domain, but not for the whole Internet?

On this post I'll try to provide more information on some lesser documented aspects of p11-kit, which provide additional control over the CA certificate bundle in a system. That is, I'll explain how we can do better than disabling CAs, and how we can restrict CAs to particular domains. The following instructions are limited to Fedora 22+ which has deployed a shared trust database for certificates based on p11-kit. This database, is not only an archive of trusted certificates, but also provides the option to attach additional attributes to CA certificates in the form of PKIX extensions. These extensions are called stapled extensions in p11-kit jargon and they override any extensions available in the trust certificates. That, allows to enforce additional restrictions to the purpose and scope of a certificate.

I'll attempt to demonstrate this feature using an example. Let's consider the case where your employer's IT department provided you with a CA certificate to trust for communications within the company. Let's also assume that the company's internal domain is called "example.com". In that scenario as a user I'd like to restrict the provided CA certificate to example.com domain to prevent anyone with access to the corporate private key from being able to hijack any connection outside the company scope. This is not only out of paranoia against a potential corporate big-brother but also to keep a good security practice and avoid having master keys. A stolen corporate CA key which is trusted for everything under the sun provides a potential attacker not only with access to company's internal communication, but also with access to Internet communication of any corporate user.

How would we install such certificate in a way that it is restricted only to example.com? Assuming that the CA certificate is provided at the example.com-root.pem file, the following command will add the company's certificate to the trusted list.

$ sudo trust anchor example.com-root.pem

That will create a file in /etc/pki/ca-trust/source containing the CA certificate (for more information on adding and removing CA certificates in Fedora see the update-ca-trust manpage).

If we edit this file we will see something like the following.

[p11-kit-object-v1]
trusted: true
x-distrusted: false
private: false
certificate-category: authority
-----BEGIN CERTIFICATE-----
MIIDsDCCAxmgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBnTELMAkGA1UEBhMCVVMx
...
-----END CERTIFICATE-----

This contains the certificate of the CA as well as various basic flags set to it.
How can we now attach a stapled extension to it?

We need to add another object in that database containing the extension. But let's see the process step by step. First we need to extract the certificate's public key because that's how p11-kit identifies existing objects. A command to achieve that is the following:

$ certool --pubkey-info --infile example.com-root.pem --outfile example.com-pubkey.pem

The output file will contain a public key in PEM format (identifiable by the "-----BEGIN PUBLIC KEY-----" header). We now edit the p11-kit file in /etc/pki/ca-trust/source containing our certificate and append the following.

[p11-kit-object-v1]
class: x-certificate-extension
label: "Example.com CA restriction"
object-id: 2.5.29.30
value: "%30%1a%06%03%55%1d%1e%04%13%30%11%a0%0f%30%0d%82%0b%65%78%61%6d%70%6c%65%2e%63%6f%6d"
-----BEGIN PUBLIC KEY-----
...
-----END PUBLIC KEY-----

Where the public key part is copied from the example.com-pubkey.pem file.

This added object, is a stapled extension containing a PKIX name constraints extension which allows this CA to be used for certificates under the "example.com" domain. If you attempt to connect to a host with a certificate of this CA you will get the following error:

$ gnutls-cli www.no-example.com
...
Status: The certificate is NOT trusted. The certificate chain violates the signer's constraints.
*** PKI verification of server certificate failed...

Note that, although NSS and openssl applications check some extensions (such as key purpose) from this trust database, they do not consider the name constraints extension. This may change in the future, but currently only GnuTLS applications under Fedora will honor this extension. The reason it works under Fedora distribution is because GnuTLS is compiled using the --with-default-trust-store-pkcs11="pkcs11:" configuration option which makes it use the p11-kit trust DB directly.

A question at this point, after seeing the p11-kit object format, is how can we generate the "value" listed above containing the desired constraints? The value contains a DER encoded certificate extension which corresponds to the object identifier "object-id" field. In this case the object-id field contains the object identifier for NameConstraints extension (2.5.29.30).

Unfortunately there are no available tools to generate this value, that I'm aware of. I created a sample application which will generate a valid name constraints value to be set above. The tool can be found at this github repository.

After you compile, run:

$ ./nconstraints mydomain.com myotherdomain.com
%30%30%06%03%55%1d%1e%04%29%30%27%a0%25%30%0e%82%0c%6d%79%64%6f%6d%61%69%6e%2e%63%6f%6d%30%13%82%11%6d%79%6f%74%68%65%72%64%6f%6d%61%69%6e%2e%63%6f%6d

and as you see, this command will provide the required string.

Happy hacking!