Developer’s Corner: QPACK Mnemonic Technique
To index or not to index?
– Prince Hamlet (paraphrased)
QPACK is the mechanism used to compress headers (or, to use the new nomenclature, “fields”) in HTTP/3. QPACK, while based on HPACK, was designed specifically for HTTP/3: it takes advantage of the properties of the transport protocol (QUIC).
We wrote our own library, ls-qpack, that implements QPACK. The library’s origins go back to 2017, for we participated in the development of the QPACK specification itself. That is when we developed the QPACK Mnemonic Technique.
The goal of QPACK is to reduce the space taken up by HTTP headers. Smaller size translates to higher throughput and better user experience. Effectiveness of data compression is expressed as compression ratio, defined in Data Compression, 4th ed by David Solomon as size of output divided by size of input.
Thus, for example, if you compare a string “LiteSpeed” to “ls”, the compression ratio is 2/9, or about 0.22. The smaller the number, the better the compression performance.
QPACK compresses headers in two ways:
- Individual header names and values are compressed using Huffman coding. HTTP headers in normal web applications get compressed with a compression ratio of about 0.7.
- Headers are added to a dictionary. The dictionary — called the dynamic table — can then be used to represent whole headers, achieving a much higher compression ratio.
Most of the QPACK specification deals with maintaining the dynamic table, because the dynamic table offers the best opportunity to improve compression performance. It is this, optimizing the dynamic table use, that is the topic of this article.
Note on Terminology
The HTTP Working Group has renamed headers to “fields” in its latest set of Internet Drafts. This is keeping in line with the long-term goal to allow not just headers and trailers in HTTP, but also an arbitrary number of interleaved sets of name/value pairs which, of course, can no longer be called “headers.”
In this document, however, we will use the term “headers,” as it has been used for the last three decades or so to mean exactly what this article discusses and thus causes no confusion.
In the QPACK specification and in some other QPACK-related articles, the verb “to index” is used to mean two things:
- to refer to an entry in the dynamic table (from a header block) and
- (less commonly) to add an entry to the dynamic table.
In this article, the second sense is used.
The dynamic table used in QPACK has a limited size. Once the table is full, an addition of an item to it results in a removal (called eviction in QPACK) of some items. The addition and eviction of dynamic table entries happens in a first-in-first-out — FIFO — manner; given enough new entries, a header will be evicted.
The optimization question, then, is
What should be inserted into the dynamic table?
Inserting a header (also called indexing it) that is not likely to be seen again wastes space in the dynamic table.
Example of header that should not go into the dynamic table
How do we know whether a particular header will occur again? Certainly, there are some headers that are present in each HTTP message — request or response — and they always have the same value.
user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0
Example of header that should definitely be indexed
Some implementations use the simple “index everything” approach. Upon examination, it often proves to perform better than one would expect (that is, better than terrible). This is due to the repetitive nature of the data (and the fact that an individual set of headers is likely to be much smaller than the dynamic table).
Most QPACK and HPACK implementations, however, employ one or more of the following heuristics:
- Index well-known headers, such as Server and User-Agent.
- Do not index other well-known headers, such as Set-Cookie.
- Do not index headers whose values are small.
Can we do better?
Without any a priori information, it is impossible to know whether a particular header will repeat again. Because an HTTP session usually consists of more than one request, the encoder can retain some information from previous requests in the session to allow it to make better than random guesses.
The main idea behind the Mnemonic Technique is simple: A header that repeats once is likely to repeat again. Otherwise, we have no idea whether this something is likely to repeat and we guess that it won’t.
The encoder keeps header history (or remembers them, thus the Mnemonic Technique) to which it refers:
Insert this header into the dynamic table only if it has been seen recently.
QPACK Mnemonic Technique Implementation
The QPACK encoder updates the history each time it encodes a header. The history is a list of recently seen headers. Similar to the dynamic table, it is a FIFO structure: older entries are pushed out as new entries are added.
To save memory, instead of keeping copies of the name and value strings, their hashes are kept instead. The hashes are 32-bit values and so collisions — two headers whose hashes are the same — should be quite rare. When a collision does occur, it only results in a misprediction and a worsened compression performance.
The history size matters as well:
- If it is too small, we may not index a header that actually repeats, because by the time it repeats, we’ve lost that history.
- If it is too large, we may index something that repeats with a period larger than the size of the dynamic table. In other words, the next time we see the header, its entry in the dynamic table has already been evicted.
To keep the history within a reasonable range, the encoder maintains the average number of headers per header set and the average number of entries in the dynamic table.
The process of encoding a header may include emitting two literals, because of the inability to risk a blocked stream. A literal is a non-indexed string, plain or Huffman-encoded. One is emitted as the insertion into the dynamic table on the encoding stream and the other as a literal representation in the header block. In other words, processing a header may result in inflation, not in deflation. This is perfectly fine — but not when it causes the cumulative compression ratio to climb too high.
This problem is most likely to occur in the beginning of the connection, when the dynamic table is just beginning to be populated. For example, consider two header sets. In the first header set, we see header A. It is emitted as a literal in the header block and hash(A) is added to history. In the second header, we see the same name/value pair A again. Because history contains hash(A), the header A is now added to the dynamic table and is written to the encoder stream. If the stream cannot be risked, the header A must be emitted as a literal again in the header block. That makes for 3 literals emitted for 2 headers. Huffman compression gives about 0.7 compression ratio. You can see that 3 * 0.7 / 2 > 1.0 — inflation!
To cope with that, the encoder keeps track of the cumulative compression ratio and checks it before emitting two literals. If the operation would go over the threshold (the encoder actually uses a value slightly smaller than 1.0), the header is not indexed and only the header block literal is emitted.
The Mnemonic Technique performs well. While it is difficult to compare different QPACK implementations head to head, there is a proxy in the form of the QIF corpus. The QIF format was used to aid development and interoperability of different QPACK stacks. The corpus consists of recordings of three web sessions. These recordings (QIF files) can be processed by command-line tools to run an offline QPACK encoder, producing binary output which can be read by command-line QPACK decoders.
In addition to the QIF files themselves, the GitHub repository contains binary encodings produced by several different QPACK implementations. To approximate the compression ratio, we will divide the size of the binary compressed output by the size of the source QIF file, which is just text.
The repository contains output for many different scenarios. We will consider two:
- Standard table size (4KB, although some browsers use 16KB or 64KB), risking blocked headers is allowed.
- Standard table size, no risking blocked headers.
These scenarios approximate the real world. Browsers have begun to advertise larger tables, 16KB and 64KB, but when the encodings were generated, 4KB was the largest table size used. The ability to risk header blocking is a tradeoff: Risking a blocked header improves compression; a header blocked due to packet loss incurs a significant penalty in the form of at least one round-trip to resend the data.
We will examine how well the different encoders compress fb-resp.qif, which is a session of responses from a Facebook server. It is the largest of the three QIF files in the corpus.
1. Risking blocked headers
2. No blocked header risk
The ls-qpack encoder performs well in both scenarios. Any initial performance penalty incurred by not using the dynamic table immediately is well-amortized over the run of the session.
When risking blocked headers, other QPACK encoders are close to ls-qpack’s compression performance. When you take away that crutch, their performance lags by a significant amount.
Note that you can configure ls-qpack encoder never to risk streams, as the decoder does not mandate risking streams, only allowed. In that case, at the small cost of compression performance, the operator of the encoder (in the case of LiteSpeed, our server) can eliminate the blocked header risk — and improve user experience — unilaterally.
Another important property of ls-qpack is that the use of the Mnemonic Technique allows it to be completely application-agnostic. It will work well whether the HTTP/3 headers are those of a web session or some other, perhaps completely non-web-related, protocol that uses HTTP/3 as its substrate.
The ls-qpack QPACK library delivers excellent compression performance thanks to its innovative encoder design. This design is based on a simple insight that a header that repeats once is likely to repeat again. The Mnemonic Technique is the best available mechanism for improving compression performance in QPACK. It cannot be beat: and I expect all competing stacks at some point to implement a version of it. (You are welcome!)