Does AES 256 GCM have any restricted byte sequences imposed on its cypher text?

Question

Specifically I am inquiring, if the cypher text can include a byte sequence such as 170303, which is one possible TLS record header.

Normally the application that parses the TCP byte stream delimits the TLS record by parsing the header and extracting the length in octets from 4th and 5th byte following header start. Then, I assume it skips ahead and tries to read the next record at the offset calculated by the previously parsed header.

My question is, did the implementation of AES 256 GCM for TLS 1.3 impose any restrictions on cypher output? RFC makes no mention of it. Can there be a TLS record which starts with a header 170303xxxx, but then also has 170303 as part of its cypher text?

score 16 · Accepted Answer · answered Jan 05 '24 at 03:09

Except for some special (and very rare) 'format-preserving' modes, all modern encryption algorithms, including AES-GCM, can handle any byte sequence in plaintext and produce any byte sequence as ciphertext. (In fact the algorithms can mostly handle any bit sequence, but implementations on byte-oriented computers mostly handle only bytes, especially since they are usually written in C and C support for sub-byte data is implementation-dependent. AES-GCM is sort of an exception; it consists of AES-CTR plus GMAC both of which can handle any bits, but NIST nevertheless specifies GCM to handle only 8-bit bytes. Possibly because they have had issues in the past with crypto implementations supposedly tested for conformance that still fail in edge cases, and practically no one nowadays needs or even wants non-byte crypto.)

When data constraints apply -- such as sending ciphertext by non-MIME email, or storing it in certain databases that don't support binary aka 'blob' data -- it is common to encode the ciphertext in a form that satisfies those constraints, such as hex, base64, URL-safe base64, base32, base58, base95, etc. However SSL/TLS (all versions) does not require this; at record level it allows any byte sequence in the body and, yes, each record is therefore delimited ONLY by the length specified in the record header and never by its content. (Some records, like those in the handshake subprotocol, do have constraints on their contents, but if encrypted these apply to the plaintext before encryption or after decryption, not the ciphertext.)

score 10 · Answer 2 · answered Jan 05 '24 at 11:02

Any block cipher that can encrypt arbitrary binary data must be able to produce all byte sequences in its output, or its output must be larger than its input for at least some inputs. But AES-GCM produces same-sized output and accepts binary input.

If there were any restrictions, it would have less entropy per byte of output than fully random input. i.e. there would be fewer than 2^n possible output values, where n is the output length in bits. That would mean it couldn't uniquely encode each of the 2^n possible input streams with that length.

(This principle is true in general, not just for ciphers, and is why you can't losslessly compress arbitrary random data. Often called the pigeonhole principle. https://en.wikipedia.org/wiki/Pigeonhole_principle)

Does AES 256 GCM have any restricted byte sequences imposed on its cypher text?

2 Answers2