SMT concepts
This document covers more concepts needed in understanding our specific design of the zkProver storage using binary SMTs. These concepts also help in elucidating how keys influence the shape of binary SMTs.
First is the level of a leaf. The level of a leaf,
Leaf levels¶
Consider the below figure for an SMT storing seven key-value pairs, built by following the principles explained in the foregoing subsection;
where the keys are,
The leaf levels are as follows;
As illustrated, keys basically determine the shape of the SMT. They dictate where respective leaves must be placed when building the SMT.
The main determining factor of the SMT shape is in fact the common key-bits among the keys. For instance, the reason why the leaves
This explains why different leaves in SMTs can have different levels.
The height of a Merkle Tree refers to the largest number of edges traversed when navigating from the root to any leaf. Since all leaves are of the same level in Merkle Trees, the concept of a height coincide with that of the level of a leaf for Merkle Trees.
But this is not the case for SMTs. Since leaf levels differ from one leaf to another in SMTs, the height of an SMT is not the same as the leaf level.
Rather, the height of an SMT is defined as the largest leaf level among the various leaf levels of leaves on the SMT. For instance, the height of the SMT depicted in the figure above, is
Now, since all keys have the same fixed key-length, they not only influence SMT leaf levels and shapes, but also restrict SMT heights to the fixed key-length. The maximum height of an SMT is the maximum key-length imposed on all keys.
Remaining keys¶
In a general Sparse Merkle Tree (SMT), values are stored at their respective leaf-nodes.
But a leaf node
Consider again the SMT of the 7 key-value pairs depicted in the figure above. The remaining keys of each of the 7 leaves in the SMT are as follows;
Fake-leaf attack¶
Note that the above simplified design of binary SMTs, based on key-value pairs, presents some problems. The characteristic of binary SMTs having leaves at different levels can be problematic to verifiers, especially when carrying out a simple Merkle proof.
Scenario: Fake SMT leaf¶
What if the verifier is presented with a fake leaf?
Consider the below figure, showing a binary SMT with a branch
That is, suppose the verifier is provided with the following information;
- The key-value
, where and . - The root
, the number of levels to root, and the siblings and .
That is, the Attacker claims that some
Verifier is unaware that
So then, the verifier being unaware that
- He uses the key
to navigate the tree until locating the supposed leaf . - He computes
and sets it as . - Then takes the sibling
and calculates
. - And then, uses
to compute the root, .
The question is: “Does the fake leaf
Since the actual branch
Therefore, the fake leaf
Solution to the Fake-leaf attack¶
In order to circumvent the Fake-Leaf Attack we modify how the binary SMTs are built.
Here’s the trick: When building binary SMTs, differentiate between how leaves are hashed and how branches are hashed.
That is, use two different hash functions; one hash function to hash leaves, denote it by
How does this prevent the Fake-leaf attack?¶
Reconsider now, the Scenario A, given above. Recall that the Attacker provides the following;
- The key-value
, where and . - The root $ \mathbf{{root}_{ab..f}}$ , the number of levels to root, and the siblings
and .
The verifier suspecting no foul, uses
He subsequently starts the Merkle proof by hashing the value
- He then sets
. - And further computes
. - Again, calculates the root,
.
But the actual branch
The parent branch
Non-binding key-value pairs¶
Whenever the verifier needs to check inclusion of the given key-value pair
Both computations involve climbing the tree from the located leaf
- Checking correctness of the key
.
That is, verifier takes the Remaining Key,
Suppose the number of levels to root is 3, and the least-significant bits used for navigation are
In order to check key-correctness, verifier the remaining key
-
Concatenates
and gets , -
Concatenates
then gets , -
Concatenates
and gets .
He then sets
- The Merkle proof: That is, checking whether the value stored at the located leaf
was indeed included in computing the root, .
This computation was illustrated several times in the above discussions. Note that the key-correctness and the Merkle proof are simultaneously carried out.
Example: Indistinguishable leaves¶
Suppose a binary SMT contains a key-value pair
Note that, when building binary SMTs, it is permissible to have another key-value pair
An Attacker can pick the key-value pair
Consider the below figure and suppose the Attacker provides the following data;
- The key-value
, where and . - The root,
, the number of levels to root = 3, and the siblings , and .
The verifier uses the least-significant key bits;
In order to ensure that
-
Concatenates
, and gets , -
Concatenates
to get , -
Concatenates
, yielding .
He sets
As the verifier climbs the tree to test key-correctness, he concurrently checks if the value
-
He computes the hash of
and sets it as, . -
Then he uses
to compute, . -
He also uses
to compute, . -
He further calculates,
.
Next, the verifier checks if
Since
, , , .
The verifier therefore concludes that the key-value pair
Why is this attack successful?¶
Note that equality of values,
The downfall of our binary SMTs design, thus far, is that it does not give the verifier any equation that relates or ties the keys to their associated values.
In other words, the attack succeeds simply because the key-value pairs (as ‘committed’ values) are not binding.
Solution to the non-binding key-value problem¶
The solution to this problem is straightforward, and it is to build the binary SMTs in such a way that the key-value pairs are binding. This means, create a relationship between the keys and their associated values, so that the verifier can simply check if this relationship holds true.
In order to ensure that checking such a relationship blends with the usual proof machinery, one has two options. The naïve solution, which involves the keys, is one option.
The Naïve solution¶
The naïve solution is to simply include keys in the argument of the hash function, when forming leaves.
That is, when building a binary SMT, one includes a key-value pair
Does this change remedy the non-binding problem?
Suppose
Since the hash functions used are collision-resistant, it follows that
Consequently, although the key-value pairs
where;
The only chance for the Merkle proof to pass is if the key-value pairs
The inclusion of keys, in the argument of the hash functions, therefore ensures that leaves
A better solution¶
The other solution, which is much more apt than the Naïve option, utilises the remaining keys when forming leaves.
Since levels to root is related to the Remaining Key (
That is, for a key-value pair
With this strategy, the verifier needs the remaining key
- Firstly, picking the correct hash function
for leaves, - Secondly, concatenating the value
stored at the leaf and the remaining key , instead of the whole key , - Thirdly, hashing the concatenation
.
This approach not only ensures that key-value pairs in our SMTs are now binding, but also implicitly ‘encodes’ the levels to root in the leaf.
The strategy of using the
Hiding values¶
It is often necessary to ensure that a commitment scheme has the hiding property. And this can be achieved by storing hashes of values, as opposed to plainly storing the values.
A leaf therefore is henceforth constructed in two steps;
-
Firstly, for a key-value pair
, compute the hash the value , $$ \text{Hashed Value} = \text{HV}\mathbf{{x}} = \mathbf{H}(V_\mathbf{{x}}) $$ -
Secondly, form the leaf containing
, as follows, $$ \mathbf{L_{x}} = \mathbf{H_{leaf}}( \text{RK}\mathbf{x} | \text{HV}\mathbf{{x}}) $$
Since it is infeasible to compute the preimage of the hash functions,
The prover therefore achieves zero-knowledge by providing the pair,
The verifier, on the other hand, has to adjust the Merkle proof by starting with;
- Firstly, picking the correct hash function
for leaf nodes, - Secondly, concatenating the hashed-value
and the remaining key , - Thirdly, hashing the concatenation in order to form the leaf,
.
Example: Zero-Knowledge Merkle Proof¶
The following example illustrates a Merkle proof when the above strategy is applied.
Consider an SMT where the keys are 8-bit long, and the prover commits to the key-value
Since the levels to root is 3, the prover provides; the least-significant key-bits,
The verifier first uses the least-significant bits of the key
-
He computes,
-
Then, he uses the sibling
to compute, . -
Next, he computes,
. -
Now, verifier uses
to compute the supposed root, . -
Checks if
equals .
The verifier accepts that the key-value pair