diff --git a/content/posts/2025-using-tls-ech-from-python.md b/content/posts/2025-using-tls-ech-from-python.md new file mode 100644 index 0000000..e06aa48 --- /dev/null +++ b/content/posts/2025-using-tls-ech-from-python.md @@ -0,0 +1,179 @@ +--- +title: "Using TLS ECH from Python" +date: 2025-01-10T13:00:00-00:00 +tags: + - DEfO + - ECH + - OpenSSL + - Python + - TLS +params: + author: 'Iain Learmonth' +--- + +At first, the idea of encrypting more of the metadata found inside the initial packet (the "ClientHello") of a TLS +connection may seem simple and obvious, but there are of course reasons that this wasn't done right from the start. +In this post I will describe the flow of a connection using Encrypted Client Hello (ECH) to protect the metadata fields, +and present a working code example using a fork of CPython built with DEfO project's OpenSSL fork to connect to +ECH-enabled HTTPS servers. + +To understand why this is an issue, let's take a step back and look at how websites are hosted. +Many websites are hosted on shared servers, which means that a single server machine is responsible for serving +multiple, possibly hundreds or thousands, of websites. +This is known as the shared hosting model. +In this setup, when a user types in a URL or clicks on a link to visit a website and the browser connects to the server, +the server needs to know which website the users is requesting. +This is where the Server Name Indication (SNI) comes in - it's a field in the initial packet of a TLS connection that +tells the server which website the user is trying to access. +The server can then send the correct certificate so that the browser can authenticate the connection, and then send the +requested website content. + +Because this field was sent unencrypted, this means that anyone who can see the traffic between the user's browser and +the server can intercept the SNI and know which website the user is trying to visit. +This can be a privacy concern, as it allows ISPs, network administrators, or other unwanted observers to build a profile +of the user's browsing history. +It's not just about the websites they visit, but also about the potential for censorship or targeted attacks. +With the SNI being unencrypted, it's like sending a postcard with the address visible to anyone who handles it - it may +not be the end of the world for most browsing activity, but it's certainly not private. +Encrypted Client Hello aims to change this by encrypting the SNI and other metadata, making it much harder for third +parties to intercept and exploit this information. + +So, why wasn't it easy to protect the SNI and other metadata from the start? +The main challenge was that, in order to encrypt the SNI, the client (i.e., the user's browser) needs to know the +public key that the server wants the ClientHello to be encrypted with in advance. +However, the server's ECH public key is tied to the specific website being requested, and there wasn't a straightforward +way to discover a public key that could be used to talk to the server without revealing the SNI. +This created a chicken-and-egg problem, where the client couldn't encrypt the SNI without knowing the server's public +key, but it couldn't know the server's public key without sending the SNI in plaintext. + +This problem is solved with ECH by introducing a new type of DNS record, called an +[HTTPS record](https://datatracker.ietf.org/doc/html/rfc9460). +An HTTPS record is a special type of DNS record that contains the ECH public key of the server, along with other metadata, +in a way that can be retrieved by the client without revealing the SNI (the website name is still leaked via the DNS +request, but it is possible to protect your requests using DNS-over-TLS or DNS-over-HTTPS). +The HTTPS record is typically retrieved by the client during the DNS lookup process, before the TLS connection is +established. + +The HTTPS record contains an ECH configuration, which is used to encrypt the SNI and other metadata. +This is generated by the server and is tied to the specific configuration of the server, rather than to a specific +website. +By using HTTPS records to retrieve the server's ECH public key, we are able to break the chicken-and-egg problem and +provide a way to encrypt the SNI and other metadata. + +Before we can lookup the HTTPS record, it's first necessary to work out where that record would live. +These records have been designed to be quite flexible, so can accommodate services running on non-default port numbers. +If the default port number is in use then the HTTPS record will be on the same domain name as the website, but for +non-default port numbers, there will be a prefix to the domain name: + +```python +def svcbname(url: str) -> str: + """Derive DNS name of SVCB/HTTPS record corresponding to target URL.""" + parsed = urllib.parse.urlparse(url) + if parsed.scheme == "https": + if (parsed.port or 443) == 443: + return parsed.hostname + else: + return f"_{parsed.port}._https.{parsed.hostname}" + elif parsed.scheme == "http": + if (parsed.port or 80) in (443, 80): + return parsed.hostname + else: + return f"_{parsed.port}._https.{parsed.hostname}" + else: + # For now, no other scheme is supported + return None +``` + +To keep it simple, the examples in this post will use plain DNS but the technique is equally applicable to DNS-over-TLS +and DNS-over-HTTPS. Now that we have the domain name to query, we can fetch the ECH configuration from the DNS using +the [dnspython](https://www.dnspython.org/) library: + +```python +def get_ech_configs(domain) -> List[bytes]: + try: + answers = dns.resolver.resolve(domain, "HTTPS") + except dns.resolver.NoAnswer: + logging.warning(f"No HTTPS record found for {domain}") + return [] + except Exception as e: + logging.critical(f"DNS query failed: {e}") + sys.exit(1) + configs: List[bytes] = [] + for rdata in answers: + if hasattr(rdata, "params"): + params = rdata.params + echconfig = params.get(5) + if echconfig: + configs.append(echconfig.ech) + if len(configs) == 0: + logging.warning(f"No echconfig found in HTTPS record for {domain}") + return configs +``` + +Once the ECH configurations are known, these can be used to establish the connection and fetch the website: + +```python +def get_http(url, ech_configs) -> bytes: + parser = urllib.parse.urlparse(url) + hostname, port, path = url.hostname, url.port, url.path + logging.debug("Performing GET request for https://{hostname}:{port}/{path}") + context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) + context.load_verify_locations(certifi.where()) + for config in ech_configs: + try: + context.set_ech_config(config) + except ssl.SSLError as e: + logging.error(f"SSL error: {e}") + pass + with socket.create_connection((hostname, port)) as sock: + with context.wrap_socket(sock, server_hostname=hostname, do_handshake_on_connect=False) as ssock: + try: + ssock.do_handshake() + logging.debug("Handshake completed with ECH status: %s", ssock.get_ech_status().name) + logging.debug("Inner SNI: %s, Outer SNI: %s", ssock.server_hostname, ssock.outer_server_hostname) + request = f'GET {path} HTTP/1.1\r\nHost: {hostname}\r\nConnection: close\r\n\r\n' + ssock.sendall(request.encode('utf-8')) + response = b'' + while True: + data = ssock.recv(4096) + if not data: + break + response += data + return response + except ssl.SSLError as e: + logging.error(f"SSL error: {e}") + raise e +``` + +The important step here is the new +[`set_ech_config`](https://irl.github.io/cpython/library/ssl.html#ssl.SSLContext.set_ech_config) method on the +`SSLContext` that allows you to add the ECH configuration containing the public key. +If there are multiple records, the underlying OpenSSL will determine which of the keys to use. +There are also a few new methods that allow you to get the status information relating to ECH from the `SSLSocket` +after the completion of the handshake. + +In the simple case, that's all there is to it. +If you were to watch the connection with Wireshark you would not be able to see the true SNI being sent to the server +and would only see the decoy SNI present in the unencrypted "ClientHelloOuter". +This decoy SNI is added to appease [middleboxes](https://en.wikipedia.org/wiki/Middlebox) that may block traffic, +accidentally or deliberately, if that field is missing entirely. +There are also further protections against such middleboxes from the application of GREASE: + +> If the client attempts to connect to a server and does not have an ECHConfig structure available for the server, it +> SHOULD send a GREASE "encrypted_client_hello" extension in the first ClientHello [...] + +This means that if your client supports ECH but does not have the configuration available to use it, the client should +still send an ECH extension filled with nonsense anyway. +This will help to detect deployment issues early as errors will be immediately obvious to users and won't rely on +servers having deployed ECH before the errors are triggered. + +Finally, if the server sees this GREASE ECH extension then it can use this to know that you support ECH but didn't +have a configuration available. +In its reply, it can send a "retry config" and then terminate the connection. +You then have the configuration available to start the connection again with a real ECH extension this time, and can +cache that for future requests too. + +For a full client example including the use of retry configs, you can see our +[example Python client](https://github.com/defo-project/docker-defo-client/blob/main/pyclient.py) at GitHub. +You'll need to use this with our [CPython fork](https://github.com/irl/cpython) and +[OpenSSL fork](https://github.com/defo-project/openssl).