Keccak-f state machine
The Keccak-F state machine is one of the secondary zkProver state machines. It computes message string hashes and validates the accuracy of those computations upon the request of Main SM.
Although the architecture of the original Keccak hash function is simple, the Keccak-F SM is not just a state machine or a simple automated version of the cryptographic hash function.
First, a gates state machine is used to implement the Keccak-F SM. A binary circuit that operates on a bit-by-bit basis.
Second, with regard to the Main SM, the Padding-KK SM, the Padding-KK-Bit SM, and the Bits2Field SM together provide a framework within which the Keccak-F SM is realized.
Thirdly, the Keccak-F circuit is constructed in such a way that executing it once is effectively equivalent to operating forty-four (\(44\)) hashing circuits simultaneously, particularly in the first version of the zkEVM public testnet. For further information on how this parallelism technique is applied, see the Bits2Field SM.
The Keccak-F circuit is briefly described in this article, along with a thorough explanation of the widely used Keccak-256 hash function and its specific parameters as they apply to the Polygon zkEVM implementation.
Keccak-F circuit¶
The Keccak-F circuit has two types of gates, types \(\mathtt{0}\) and \(\mathtt{1}\), corresponding to the two binary operations it performs, the \(\mathtt{XOR}\) and \(\mathtt{ANDP}\).
The Keccak-F executor builds the constant polynomials, \(\mathtt{ConnA}\), \(\mathtt{ConnB}\) and \(\mathtt{ConnC}\), and these are to be tested if they match their corresponding polynomials, \(\mathtt{kA}\), \(\mathtt{kB}\) and \(\mathtt{kC}\).
Also, for any \(\texttt{op} \in \{ \mathtt{XOR}, \mathtt{ANDP} \}\) we have,
The \(44\) bits are loaded into the state machine as \(11\)-bit chunks.
In keccakf.pil
, each committed polynomial \(\texttt{a[4]}\) is expressed in terms of 4 chunks, where each is \(11\) bits long. The corresponding a \(44\)-bit array can be expressed as,
where \(\texttt{a}[i]\) for each \(i \in \{ 0, 1, 2, 3 \}\).
The verification involves a copy constraint
and a Plookup as,
for each \(i \in \{ 0, 1, 2, 3 \}\).
This covers the Keccak-F circuit in a nutshell together with its PIL code. See the codes of sm_keccakf.js and keccakf.pil on GitHub.
Keccak-256 hash function¶
There are seven Keccak-F permutation functions, each indicated by \(\texttt{Keccak}\)-\(f[b]\), where \(b = 5\times 5\times 2^l\) is the size of the internal state of the hash function, for \(0 \leq l \leq w\).
The zkProver’s Keccak state machine is a verifiable automisation of a Keccak-F permutation function, which amounts to an irreversible scrambling of bits of a string \(\mathbf{s} \in \mathbb{Z}_2^b\), where \(b = 5\times 5\times 2^6 = 1600\).
The EVM utilises the Keccak-256 hash function, which is a sponge construction with capacity \(c = 512\) bits, and denoted by Keccak\([512]\). That is, the Keccak-256 notation puts emphasis on the \(256\)-bit security level, while the Keccak\([512]\) notation seeks to depict the actual capacity of \(512\) bits.
Bitrate and capacity¶
Although the internal state is \(\mathtt{1600}\) bits, Keccak-F intakes a fixed number of bits as input, called the \(\texttt{bitrate}\) (or simply, \(\texttt{rate}\)) and it is denoted by \(\texttt{r}\). In our specific case, the bitrate \(\texttt{r} = 1088\), whilst the capacity, \(\texttt{c} = 512\).
The size of a single output is \(\texttt{r} = 1088\) bits. However, users can choose their required length by truncating the output, which is of length \(\texttt{d}\), where \(\texttt{d} = 1088*k\), for some positive integer \(k\).
The Keccak-F permutation used in Keccak \([c]\) is Keccak-\(p[1600, 24]\) (see NIST SHA-3 Standard).
Thus, given an input bit string \(\mathtt{M}\) and a output length \(\mathtt{d}\), Keccak \([c](M, d)\) outputs a \(d\)-bit string following the previous sponge construction description.
Keccak-F padding rule¶
The Keccak-F permutation operates on a state of width \(b = 1600\) bits (or \(200\) bytes) and a bitrate
But not every input string comes with this tailored bit-length.
Therefore, every input string is split into \(\mathtt{1088}\)-bit chunks, where padding is applied to the tail-end chunk with \(\mathtt{1088}\) bits or lesser.
The last ingredient we need to define in order to completely specify the hash function is the padding rule.
In Keccak \([c]\), the padding \(\texttt{pad10*1}\) is used. If we define \(\mathtt{j = (-m-2) \mod{r}}\), where \(\mathtt{m}\) is the length of the input in bits, then the padding we have to append to the original input message is,
It should be noted that our construction does not follow the FIPS-202 based standard (a.k.a SHA-3). According to the NIST specification, the SHA3 padding has been changed to
The difference is the additional \(\mathtt{01}\) bits being appended to the original message, which were not present in the original Keccak specification.
Keccak-F’s internal state¶
The \(\mathtt{1088}\)-bit (post-padding) chunks are provided sequentially, and one chunk at a time, into the Keccak-F permutation function, to be \(\texttt{XOR}\)-ed with a given initialization vector \(\texttt{IV}\) or intermediate states. The capacity bits are typically initialised to zero bits and are not affected by any external bits.
However, instead of a plain bit-string of length \(\mathtt{1600}\) bits, a state \(\mathbf{s}\) in the Keccak SM is best visualised in 3D form as follows;
- Each bit is imagined as a cube,
- The entire \(\mathtt{1600}\)-bit state is thought of as a 3D array of cubes (bits): \(\text{A}[5][5][64]\). That is, sixty-four (\(5 \text{-bit}\times 5\text{-bit}\))-blocks of bits (cubes).
See the below figure for a \(\mathtt{1600}\)-bit state array, displaying;
-
The 64-bit lane \(\{[3,2,z]\}\) shown in orange, consisting of \(64\) bits, \(\mathtt{Bit}[3,2,0]\) to \(\mathtt{Bit}[3,2,63]\), and
-
The 5-bit column \(\{[2,y,63]\}\) shown in green, consisting of \(5\) bits, \(\mathtt{Bit}[2,0,63]\) to \(\mathtt{Bit}[2,4,63]\).
A bit in the state \(\mathbf{s}\) can be denoted by \(\texttt{Bit}[x][y][z]\) as an element of the 3D-array state, but as \(\texttt{Bit}[x,y,z]\) to indicate its location in position \((x,y,z)\) with respect to the Cartesian coordinate system.
The mapping between the bits of the state \(\mathbf{s}\), when written as a linear array of \(\mathtt{1600}\) bits, and the bits when \(\mathbf{s}\) is expressed as a 3D array, is given by,
So then, the bit \(\mathtt{a} = \mathtt{Bit}[2,3,3]\) at coordinate \((2,3,3)\) is in fact the one-thousand-and-ninety-second bit of \(\mathbf{s}\), because \(64(5\cdot 3 + 2) + 3 = 64(17) + 3 = 1091\). This bit, \(\mathtt{a} = \mathtt{Bit}[2,3,3]\), is represented in the above figure by the blue cube. See more examples of this correspondence in the table provided below.
Consider computing the \(\texttt{XOR}\) of all the 5 bits in the column \(\{[2,y,63]\}\). If its bits are as given below,
then the \(\texttt{XOR}\) of the column bits, denoted by \(C[2,63]\), is \(C[2,63] = 1 \oplus 1 \oplus 0 \oplus 0 \oplus 1 = 1\).
In our notation, a column is identified by the fixed \(x\)- and \(z\)- values. Hence the \(\texttt{XOR}\) of all the 5 bits in a column is denoted by \(\mathtt{C[x,z]}\). That is,
Keccak-F rounds¶
The Keccak-F state machine runs 24 rounds, each of which is a composition of five step mappings; \(\mathtt{\theta}\), \(\mathtt{\rho}\), \(\mathtt{\pi}\), \(\mathtt{\chi}\), and \(\mathtt{\iota}\), denoted by
where \(\texttt{ir}\) is the round index, and in our case, \(0 \leq \texttt{ir} \leq 23\).
These step mappings are individually described in the subsections below. The following table illustrates how the output state of each step mapping is relayed to the next step mapping as its input state.
Therefore, given a state value, the next value of the bit \(\text{A}[x,y,z]\) is denoted by \(\text{A}'[x,y,z]\). The code shown on the right side of the table is keccakf.cpp.
Mapping Theta¶
The first step mapping, \(\mathtt{\theta}\), referred to as \(\texttt{KeccakTheta()}\) in keccak_f.cpp, can be described in three sub-steps;
- Compute the \(\mathtt{XOR}\) of the bits in the \([(x-1)\text{ mod }5, z]\)-column,
and the \(\mathtt{XOR}\) of the bits in the \([(x+1)\text{ mod }5, (z-1)\text{ mod }64]\)-column,
- Calculate the \(\mathtt{XOR}\) of the two column \(\mathtt{XOR}\)s;
- Compute the \(\mathtt{XOR}\) of \(\mathtt{D[x,z]}\) and the bit \(\mathtt{A[x,y,z]}\) of the current state value;
See the below figure, for an illustration of the \(\theta\) step mapping applied on one bit. The diagram is taken from Keccak Reference 3.0 guide.
The code of the \(\theta\) step mapping is found here: keccak_theta.cpp
Mapping Rho¶
The step mapping \(\rho\) does not change the value of the input bit, but simply moves it to another position along the \(z\)-axis. Since all operations along the \(z\)-axis are worked out modulo 64, the mapping \(\rho\) is therefore cyclic, and amounts to rotating each of the 64 bits in the same lane along the \(z\)-axis. It does this in three sub-steps;
- Set \(\mathtt{(x,y) = (0,1)}\),
- For \(t\) ranging from \(0\) to \(23\), set \(\mathtt{{A}'[x,y,z] = {A}\big[x,y,\big(z-(t+1)(t+2)/2 \big)\text{mod }64\big]}\),
- Set \(\mathtt{(x,y) = (y,(2x + 3y)\text{mod }5)}\).
Basically, \(\rho\) modifies the \(z\) coordinate of each bit, \(\mathtt{Bit}[x,y,z]\), by subtracting a specific offset constant \(\mathbf{K}\) modulo 64, where \(\mathbf{K} = (t+1)(t+2)/2\). See the below table for these offset constants used for rotation.
The 24 constants in the above table are first permuted and then set as fixed offsets corresponding to each bit of the 3D state array.
Note that all bits \(\mathtt{Bit[x,y,z]}\) such that \(\mathtt{x = 0}\) and \(\mathtt{y = 0}\) correspond to a zero offset constant. Consequently, the origin \(\mathtt{Bit}[0,0,0]\) and all the other 63 bits along the \(\{\mathtt{Bit[0,0,z]}\}\) lane remain unmoved by \(\rho\).
Although the effect of \(\rho\) is a rotation of bits along the \(z\)-axis, it actually operates on each 25-bit \((x,y)\)-slice of the state 3D-array. Hence the 25 offset constants (and not 64), including the zero offset of the origin lane \(\{\mathtt{Bit[0,0,z]}\}\).
Example¶
Here’s an example of how \(\rho\) maps two different lanes.
-
For all \(z\), where \(0\leq z\leq63\), the bits \(\{\mathtt{Bit}[2,4,z]\}\) are always off-set by \(61\) and mapped as follows,
\[ \rho\big(\mathtt{Bit}[2,4,z]\big) \mapsto \mathtt{Bit}[4,1,(z-61)\text{mod }64] \] -
Similarly, the bits \(\{\mathtt{Bit}[4,1,z]\}\) are always off-set by \(20\) and mapped as follows, $$ \rho\big(\mathtt{Bit}[4,1,z]\big) \mapsto \mathtt{Bit}[1,1,(z-20)\text{mod }64] $$
The code of the \(\rho\) step mapping is found here: keccak_rho.cpp
Mapping Pi¶
The step mapping \(\pi\) is literally a shuffle of the 25 bits in each \((x,y)\)-slice. For each \(z\), where \(0\leq z\leq63\), it is defined as the mapping,
Note that \(\pi\) fixes each of the bits \(\{\mathtt{Bit}[0,0,z]\}\), including the bit at the origin, \(\mathtt{Bit}[0,0,0]\). Also, for all bits in the 3D-array, \(\pi\) does not change the \(z\)-component. That is, all bits remain in their original \((x,y)\)-slice. To this extend, it suffices to describe \(\pi\) in terms of the images of the \((x,y)\) pairs alone, as in the table below.
Below diagram, showing the \((x,y)\)-slices, depicts the shuffling of the bits in accordance with the above table. This figure is in fact the mapping displayed in Figure 2.3 of the Keccak Reference 3.0 [Page 20; 2011].
The code of the \(\pi\) step mapping is found here: keccak_pi.cpp
Mapping Chi¶
The \(\chi\) step mapping is the non-linear layer of Keccak-F, and it can be thought of as a parallel application of \(320 = 5*64\) S-boxes operating on \(5\)-bit rows.
How \(\chi\) operates on rows¶
For a fixed \(y = b\) and \(z = c\), the \(\chi\) step mapping takes as its input the \(5\)-bit row,
It then computes a non-linear combination of each bit \(\mathtt{Bit}[x,b,c]\) in \(\mathbf{Row}[b,c]\), with the next two consecutive bits \(\mathtt{Bit}[(x+1)\text{mod }5,b,c]\) and \(\mathtt{Bit}[(x+2)\text{mod }5,b,c]\) also in \(\mathbf{Row}[b,c]\). In order to change the bit, \(\mathtt{Bit}[x,b,c]\),
-
\(\chi\) takes \(\mathtt{Bit}[(x+1)\text{mod }5,b,c]\) as input to a \(\texttt{NOT}\)-gate.
-
Then takes the output, \(\texttt{NOT}\big(\mathtt{Bit}[(x+1)\text{mod }5,b,c]\big)\), together with \(\mathtt{Bit}[(x+2)\text{mod }5,b,c]\), as inputs to an \(\texttt{AND}\)-gate.
-
Finally, it XOR-es the output of the \(\texttt{AND}\)-gate with the bit being changed, \(\mathtt{Bit}[x,b,c]\).
That is, for each 5-bit row, \(\mathbf{Row}[b,c]\), \(\chi\) can be summarized as the mapping of each bit, \(\mathtt{Bit}[x,b,c]\), as;
where \(x \in \{3,4,0,1,2\}\).
Overall, the \(\chi\) step mapping can be depicted as a “circuit” of gates as in the below diagram, taken from FIPS PUB 202, August 2015 (Figure 6, Page 15).
The code of the \(\chi\) step mapping is found here: keccak_chi.cpp
Mapping Iota¶
The \(\large{\iota}\) step mapping adds 64-bit round-constants, \(RC[ir]\), so as to disrupt any symmetry in the computation of the round outputs. Note that \(ir\) is the round index, \(0\leq ir \leq 23\). Hence, 24 distinct round-constants are required, and are denoted in array notation as,
The round-constants are XOR-ed only to a single lane of the state, in particular, the origin lane (i.e., the 64 bits \(\{\mathtt{Bit}[0,0,z]\}\) where \(0\leq z \leq 63\)).
Here’s how \(\large{\iota}\) operates on the origin lane, \(\{\mathtt{Bit}[0,0,z]\ |\ 0\leq z \leq 63\}\)¶
Firstly, the derivation of the round-constants \(RC[ir]\). As shown in the algorithm of \(\large{\iota}\), the round-constant \(RC\) is initialized to \(0^{w}\) at the beginning of each round of \(\large{\iota}\). After which, 7 specific bit-places \(\{2^j – 1\ |\ 0 \leq j \leq 6 \}\) of the round-constant are set to the bits \(\{rc[j + 7 ir]\ |\ 0 \leq j \leq 6 \}\). Meanwhile, the rest of the bits remain as zeros.
The derivation of the 7 bits, \(\{rc[j + 7 ir]\}\), is explained below.
Secondly, each round of \(\large{\iota}\) simply XOR-es the corresponding 64-bit round-constant to the 64-bit lane, \(\{\mathtt{Bit}[0,0,z]\ |\ 0\leq z \leq 63 \}\). Since only 7 bits of the round-constant are set, each round of \(\large{\iota}\) therefore alters at most 7 bits of origin lane.
In our implementation \(l = 6\), and thus \(w = 2^6 = 64\).
The code of the \(\large{\iota}\) step mapping is found here: keccak_iota.cpp
Generation of 7 bits for round-constants¶
In order to ensure that these round-constants differ from round-to-round, a linear feedback shift register (LFSR) of maximal length is used to generate them. The algorithm for generating these round-constants is given below.
The LFSR can run for 255 clocks before resetting to \(0\), and has 8 registers; \(R[0],R[1],R[2],R[3],R[4],R[5],R[6],R[7]\).
At \(t = 0\), it is initialised to \(\mathtt{10000000 = 0x80}\). A state transition (or a shift) consists of;
-
A push of a bit \(0\) into the first register \(R[0]\) while shifting the bit in register \(R[i]\) to \(R[(i+1)\text{ mod }8]\) for each \(i\) , where \(0 \leq i \leq 7\).
-
XOR of the bit from register \(R[8]\) with register \(R[0]\).
-
XOR of the bit from register \(R[8]\) with register \(R[4]\).
-
XOR of the bit from register \(R[8]\) with register \(R[5]\).
-
XOR of the bit from register \(R[8]\) with register \(R[6]\).
-
Output the bit in the first register \(R[0]\) as \(rc[j+7ir]\).
After every 7 consecutive state transitions, the LFSR produces enough bits to form the corresponding round-constant \(RC[ir]\). The 7 bits map to their bit places in \(RC[ir]\) according to the above shown table.
Example¶
Since the LFSR is initialised to \(\mathtt{10000000 = 0x80}\) at \(t = 0\), its state at the end of 7 state transitions is \(|0|0|0|0|0|0|0|1|\). So \(rc[7] = 0\). After the 8-th state transition the state of the LFSR is \(|1|0|0|0|1|1|1|0|\), and thus \(rc[8] = 1\). The next table displays generation of the 7 bits needed to construct the second round-constant, \(RC[1]\). That is, the LFSR’s state transitions for \(t = 7\) to \(t = 13\).
It can be observed, on the right-hand side of the above table, that the \(\mathtt{rc[j+7]}\) and \(\mathtt{RC[1][2^j - 1]}\) are the same. That is,
This results in the round-constant,
All 24 round constants \(RC[i]\), where each is \(64\) bits long, are given in their hexadecimal format below.
The C++ code for the LFSR can be found in the zkEVM Prover repository here: keccak_rc.cpp
All these step mappings are consolidated in the keccakf.cpp code, which calls them as functions.