ethereum.ethash
Ethash is a proof-of-work algorithm designed to be ASIC resistant through memory hardness.
To achieve memory hardness, computing Ethash requires access to subsets of a
large structure. The particular subsets chosen are based on the nonce and block
header, while the set itself is changed every epoch
.
At a high level, the Ethash algorithm is as follows:
Create a seed value, generated with
generate_seed
and based on the preceding block numbers.From the seed, compute a pseudorandom cache with
generate_cache
.From the cache, generate a dataset with
generate_dataset
. The dataset grows over time based onDATASET_EPOCH_GROWTH_SIZE
.Miners hash slices of the dataset together, which is where the memory hardness is introduced. Verification of the proof-of-work only requires the cache to be able to recompute a much smaller subset of the full dataset.
EPOCH_SIZE
Number of blocks before a dataset needs to be regenerated (known as an
"epoch".) See epoch
.
42 | EPOCH_SIZE = Uint(30000) |
---|
INITIAL_CACHE_SIZE
Size of the cache (in bytes) during the first epoch. Each subsequent epoch's
cache roughly grows by CACHE_EPOCH_GROWTH_SIZE
bytes. See cache_size
.
50 | INITIAL_CACHE_SIZE = Uint(2**24) |
---|
CACHE_EPOCH_GROWTH_SIZE
After the first epoch, the cache size grows by roughly this amount. See
cache_size
.
59 | CACHE_EPOCH_GROWTH_SIZE = Uint(2**17) |
---|
INITIAL_DATASET_SIZE
Size of the dataset (in bytes) during the first epoch. Each subsequent epoch's
dataset roughly grows by DATASET_EPOCH_GROWTH_SIZE
bytes. See
dataset_size
.
67 | INITIAL_DATASET_SIZE = Uint(2**30) |
---|
DATASET_EPOCH_GROWTH_SIZE
After the first epoch, the dataset size grows by roughly this amount. See
dataset_size
.
77 | DATASET_EPOCH_GROWTH_SIZE = Uint(2**23) |
---|
HASH_BYTES
Length of a hash, in bytes.
85 | HASH_BYTES = Uint(64) |
---|
MIX_BYTES
Width of mix, in bytes. See generate_dataset_item
.
90 | MIX_BYTES = Uint(128) |
---|
CACHE_ROUNDS
Number of times to repeat the keccak512
step while generating the hash. See
generate_cache
.
97 | CACHE_ROUNDS = 3 |
---|
DATASET_PARENTS
Number of parents of each dataset element. See generate_dataset_item
.
106 | DATASET_PARENTS = Uint(256) |
---|
HASHIMOTO_ACCESSES
Number of accesses in the hashimoto
loop.
113 | HASHIMOTO_ACCESSES = 64 |
---|
epoch
Obtain the epoch number to which the block identified by block_number
belongs. The first epoch is numbered zero.
An Ethash epoch is a fixed number of blocks (EPOCH_SIZE
) long, during
which the dataset remains constant. At the end of each epoch, the dataset
is generated anew. See generate_dataset
.
def epoch(block_number: Uint) -> Uint:
122 | """ |
---|---|
123 | Obtain the epoch number to which the block identified by `block_number` |
124 | belongs. The first epoch is numbered zero. |
125 |
|
126 | An Ethash epoch is a fixed number of blocks ([`EPOCH_SIZE`]) long, during |
127 | which the dataset remains constant. At the end of each epoch, the dataset |
128 | is generated anew. See [`generate_dataset`]. |
129 |
|
130 | [`EPOCH_SIZE`]: ref:ethereum.ethash.EPOCH_SIZE |
131 | [`generate_dataset`]: ref:ethereum.ethash.generate_dataset |
132 | """ |
133 | return block_number // EPOCH_SIZE |
cache_size
Obtain the cache size (in bytes) of the epoch to which block_number
belongs.
See INITIAL_CACHE_SIZE
and CACHE_EPOCH_GROWTH_SIZE
for the initial
size and linear growth rate, respectively. The cache is generated in
generate_cache
.
The actual cache size is smaller than simply multiplying
CACHE_EPOCH_GROWTH_SIZE
by the epoch number to minimize the risk of
unintended cyclic behavior. It is defined as the highest prime number below
what linear growth would calculate.
def cache_size(block_number: Uint) -> Uint:
137 | """ |
---|---|
138 | Obtain the cache size (in bytes) of the epoch to which `block_number` |
139 | belongs. |
140 |
|
141 | See [`INITIAL_CACHE_SIZE`] and [`CACHE_EPOCH_GROWTH_SIZE`] for the initial |
142 | size and linear growth rate, respectively. The cache is generated in |
143 | [`generate_cache`]. |
144 |
|
145 | The actual cache size is smaller than simply multiplying |
146 | `CACHE_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of |
147 | unintended cyclic behavior. It is defined as the highest prime number below |
148 | what linear growth would calculate. |
149 |
|
150 | [`INITIAL_CACHE_SIZE`]: ref:ethereum.ethash.INITIAL_CACHE_SIZE |
151 | [`CACHE_EPOCH_GROWTH_SIZE`]: ref:ethereum.ethash.CACHE_EPOCH_GROWTH_SIZE |
152 | [`generate_cache`]: ref:ethereum.ethash.generate_cache |
153 | """ |
154 | size = INITIAL_CACHE_SIZE + (CACHE_EPOCH_GROWTH_SIZE * epoch(block_number)) |
155 | size -= HASH_BYTES |
156 | while not is_prime(size // HASH_BYTES): |
157 | size -= Uint(2) * HASH_BYTES |
158 | |
159 | return size |
dataset_size
Obtain the dataset size (in bytes) of the epoch to which block_number
belongs.
See INITIAL_DATASET_SIZE
and DATASET_EPOCH_GROWTH_SIZE
for the
initial size and linear growth rate, respectively. The complete dataset is
generated in generate_dataset
, while the slices used in verification
are generated in generate_dataset_item
.
The actual dataset size is smaller than simply multiplying
DATASET_EPOCH_GROWTH_SIZE
by the epoch number to minimize the risk of
unintended cyclic behavior. It is defined as the highest prime number below
what linear growth would calculate.
def dataset_size(block_number: Uint) -> Uint:
163 | """ |
---|---|
164 | Obtain the dataset size (in bytes) of the epoch to which `block_number` |
165 | belongs. |
166 |
|
167 | See [`INITIAL_DATASET_SIZE`] and [`DATASET_EPOCH_GROWTH_SIZE`][ds] for the |
168 | initial size and linear growth rate, respectively. The complete dataset is |
169 | generated in [`generate_dataset`], while the slices used in verification |
170 | are generated in [`generate_dataset_item`]. |
171 |
|
172 | The actual dataset size is smaller than simply multiplying |
173 | `DATASET_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of |
174 | unintended cyclic behavior. It is defined as the highest prime number below |
175 | what linear growth would calculate. |
176 |
|
177 | [`INITIAL_DATASET_SIZE`]: ref:ethereum.ethash.INITIAL_DATASET_SIZE |
178 | [ds]: ref:ethereum.ethash.DATASET_EPOCH_GROWTH_SIZE |
179 | [`generate_dataset`]: ref:ethereum.ethash.generate_dataset |
180 | [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item |
181 | """ |
182 | size = INITIAL_DATASET_SIZE + ( |
183 | DATASET_EPOCH_GROWTH_SIZE * epoch(block_number) |
184 | ) |
185 | size -= MIX_BYTES |
186 | while not is_prime(size // MIX_BYTES): |
187 | size -= Uint(2) * MIX_BYTES |
188 | |
189 | return size |
generate_seed
Obtain the cache generation seed for the block identified by
block_number
. See generate_cache
.
def generate_seed(block_number: Uint) -> Hash32:
193 | """ |
---|---|
194 | Obtain the cache generation seed for the block identified by |
195 | `block_number`. See [`generate_cache`]. |
196 |
|
197 | [`generate_cache`]: ref:ethereum.ethash.generate_cache |
198 | """ |
199 | epoch_number = epoch(block_number) |
200 | |
201 | seed = b"\x00" * 32 |
202 | while epoch_number != 0: |
203 | seed = keccak256(seed) |
204 | epoch_number -= Uint(1) |
205 | |
206 | return Hash32(seed) |
generate_cache
Generate the cache for the block identified by block_number
. See
generate_dataset
for how the cache is used.
The cache is generated in two steps: filling the array with a chain of
keccak512
hashes, then running two rounds of Sergio Demian Lerner's
RandMemoHash on those bytes.
def generate_cache(block_number: Uint) -> Tuple[Tuple[U32, ...], ...]:
210 | """ |
---|---|
211 | Generate the cache for the block identified by `block_number`. See |
212 | [`generate_dataset`] for how the cache is used. |
213 |
|
214 | The cache is generated in two steps: filling the array with a chain of |
215 | [`keccak512`] hashes, then running two rounds of Sergio Demian Lerner's |
216 | [RandMemoHash] on those bytes. |
217 |
|
218 | [`keccak512`]: ref:ethereum.crypto.hash.keccak512 |
219 | [`generate_dataset`]: ref:ethereum.ethash.generate_dataset |
220 | [RandMemoHash]: http://www.hashcash.org/papers/memohash.pdf |
221 | """ |
222 | seed = generate_seed(block_number) |
223 | cache_size_bytes = cache_size(block_number) |
224 | |
225 | cache_size_words = cache_size_bytes // HASH_BYTES |
226 | cache = [keccak512(seed)] |
227 | |
228 | for index in range(1, cache_size_words): |
229 | cache_item = keccak512(cache[index - 1]) |
230 | cache.append(cache_item) |
231 | |
232 | for _ in range(CACHE_ROUNDS): |
233 | for index in range(cache_size_words): |
234 | # Converting `cache_size_words` to int as `-1 + Uint(5)` is an |
235 | # error. |
236 | first_cache_item = cache[ |
237 | (index - 1 + int(cache_size_words)) % int(cache_size_words) |
238 | ] |
239 | second_cache_item = cache[ |
240 | U32.from_le_bytes(cache[index][0:4]) % U32(cache_size_words) |
241 | ] |
242 | result = bytes( |
243 | [a ^ b for a, b in zip(first_cache_item, second_cache_item)] |
244 | ) |
245 | cache[index] = keccak512(result) |
246 | |
247 | return tuple( |
248 | le_bytes_to_uint32_sequence(cache_item) for cache_item in cache |
249 | ) |
fnv
A non-associative substitute for XOR, inspired by the FNV hash by Fowler,
Noll, and Vo. See fnv_hash
, generate_dataset_item
, and
hashimoto
.
Note that here we multiply the prime with the full 32-bit input, in contrast with the FNV-1 spec which multiplies the prime with one byte (octet) in turn.
def fnv(a: Union[Uint, U32], b: Union[Uint, U32]) -> U32:
253 | """ |
---|---|
254 | A non-associative substitute for XOR, inspired by the [FNV] hash by Fowler, |
255 | Noll, and Vo. See [`fnv_hash`], [`generate_dataset_item`], and |
256 | [`hashimoto`]. |
257 |
|
258 | Note that here we multiply the prime with the full 32-bit input, in |
259 | contrast with the [FNV-1] spec which multiplies the prime with one byte |
260 | (octet) in turn. |
261 |
|
262 | [`hashimoto`]: ref:ethereum.ethash.hashimoto |
263 | [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item |
264 | [`fnv_hash`]: ref:ethereum.ethash.fnv_hash |
265 | [FNV]: https://w.wiki/XKZ |
266 | [FNV-1]: http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1 |
267 | """ |
268 | # This is a faster way of doing `number % (2 ** 32)`. |
269 | result = ((Uint(a) * Uint(0x01000193)) ^ Uint(b)) & Uint(U32.MAX_VALUE) |
270 | return U32(result) |
fnv_hash
Combines data
into mix_integers
using fnv
. See hashimoto
and
generate_dataset_item
.
def fnv_hash(mix_integers: Tuple[U32, ...], data: Tuple[U32, ...]) -> Tuple[U32, ...]:
276 | """ |
---|---|
277 | Combines `data` into `mix_integers` using [`fnv`]. See [`hashimoto`] and |
278 | [`generate_dataset_item`]. |
279 |
|
280 | [`hashimoto`]: ref:ethereum.ethash.hashimoto |
281 | [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item |
282 | [`fnv`]: ref:ethereum.ethash.fnv |
283 | """ |
284 | return tuple( |
285 | fnv(mix_integers[i], data[i]) for i in range(len(mix_integers)) |
286 | ) |
generate_dataset_item
Generate a particular dataset item 0-indexed by index
by hashing
pseudorandomly-selected entries from cache
together. See fnv
and
fnv_hash
for the digest function, generate_cache
for generating
cache
, and generate_dataset
for the full dataset generation
algorithm.
def generate_dataset_item(cache: Tuple[Tuple[U32, ...], ...], index: Uint) -> Hash64:
292 | """ |
---|---|
293 | Generate a particular dataset item 0-indexed by `index` by hashing |
294 | pseudorandomly-selected entries from `cache` together. See [`fnv`] and |
295 | [`fnv_hash`] for the digest function, [`generate_cache`] for generating |
296 | `cache`, and [`generate_dataset`] for the full dataset generation |
297 | algorithm. |
298 |
|
299 | [`fnv`]: ref:ethereum.ethash.fnv |
300 | [`fnv_hash`]: ref:ethereum.ethash.fnv_hash |
301 | [`generate_dataset`]: ref:ethereum.ethash.generate_dataset |
302 | [`generate_cache`]: ref:ethereum.ethash.generate_cache |
303 | """ |
304 | mix = keccak512( |
305 | ( |
306 | le_uint32_sequence_to_uint(cache[index % ulen(cache)]) ^ index |
307 | ).to_le_bytes64() |
308 | ) |
309 | |
310 | mix_integers = le_bytes_to_uint32_sequence(mix) |
311 | |
312 | for j in (Uint(k) for k in range(DATASET_PARENTS)): |
313 | mix_word: U32 = mix_integers[j % Uint(16)] |
314 | cache_index = fnv(index ^ j, mix_word) % U32(len(cache)) |
315 | parent = cache[cache_index] |
316 | mix_integers = fnv_hash(mix_integers, parent) |
317 | |
318 | mix = Hash64(le_uint32_sequence_to_bytes(mix_integers)) |
319 | |
320 | return keccak512(mix) |
generate_dataset
Generate the full dataset for the block identified by block_number
.
This function is present only for demonstration purposes. It is not used while validating blocks.
def generate_dataset(block_number: Uint) -> Tuple[Hash64, ...]:
324 | """ |
---|---|
325 | Generate the full dataset for the block identified by `block_number`. |
326 |
|
327 | This function is present only for demonstration purposes. It is not used |
328 | while validating blocks. |
329 | """ |
330 | dataset_size_bytes: Uint = dataset_size(block_number) |
331 | cache: Tuple[Tuple[U32, ...], ...] = generate_cache(block_number) |
332 | |
333 | # TODO: Parallelize this later on if it adds value |
334 | return tuple( |
335 | generate_dataset_item(cache, Uint(index)) |
336 | for index in range(dataset_size_bytes // HASH_BYTES) |
337 | ) |
hashimoto
Obtain the mix digest and the final value for a header, by aggregating data from the full dataset.
Parameters
header_hash
is a valid RLP hash of a block header.nonce
is the propagated nonce for the given block.dataset_size
is the size of the dataset. Seedataset_size
.fetch_dataset_item
is a function that retrieves a specific dataset item based on its index.
Returns
The mix digest generated from the header hash and propagated nonce.
The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.
def hashimoto(header_hash: Hash32, nonce: Bytes8, dataset_size: Uint, fetch_dataset_item: Callable[[Uint], Tuple[U32, ...]]) -> Tuple[bytes, Hash32]:
346 | """ |
---|---|
347 | Obtain the mix digest and the final value for a header, by aggregating |
348 | data from the full dataset. |
349 |
|
350 | #### Parameters |
351 |
|
352 | - `header_hash` is a valid [RLP hash] of a block header. |
353 | - `nonce` is the propagated nonce for the given block. |
354 | - `dataset_size` is the size of the dataset. See [`dataset_size`]. |
355 | - `fetch_dataset_item` is a function that retrieves a specific dataset item |
356 | based on its index. |
357 |
|
358 | #### Returns |
359 |
|
360 | - The mix digest generated from the header hash and propagated nonce. |
361 | - The final result obtained which will be checked for leading zeros (in |
362 | byte representation) in correspondence with the block difficulty. |
363 |
|
364 | [RLP hash]: ref:ethereum.rlp.rlp_hash |
365 | [`dataset_size`]: ref:ethereum.ethash.dataset_size |
366 | """ |
367 | nonce_le = bytes(reversed(nonce)) |
368 | seed_hash = keccak512(header_hash + nonce_le) |
369 | seed_head = U32.from_le_bytes(seed_hash[:4]) |
370 | |
371 | rows = dataset_size // Uint(128) |
372 | mix = le_bytes_to_uint32_sequence(seed_hash) * (MIX_BYTES // HASH_BYTES) |
373 | |
374 | for i in range(HASHIMOTO_ACCESSES): |
375 | new_data: Tuple[U32, ...] = () |
376 | parent = fnv(U32(i) ^ seed_head, mix[i % len(mix)]) % U32(rows) |
377 | for j in range(MIX_BYTES // HASH_BYTES): |
378 | # Typecasting `parent` from U32 to Uint as 2*parent + j may |
379 | # overflow U32. |
380 | new_data += fetch_dataset_item(Uint(2) * Uint(parent) + Uint(j)) |
381 |
|
382 | mix = fnv_hash(mix, new_data) |
383 | |
384 | compressed_mix = [] |
385 | for i in range(0, len(mix), 4): |
386 | compressed_mix.append( |
387 | fnv(fnv(fnv(mix[i], mix[i + 1]), mix[i + 2]), mix[i + 3]) |
388 | ) |
389 | |
390 | mix_digest = le_uint32_sequence_to_bytes(compressed_mix) |
391 | result = keccak256(seed_hash + mix_digest) |
392 | |
393 | return mix_digest, result |
hashimoto_light
Run the hashimoto
algorithm by generating dataset item using the cache
instead of loading the full dataset into main memory.
Parameters
header_hash
is a valid RLP hash of a block header.nonce
is the propagated nonce for the given block.cache
is the cache generated bygenerate_cache
.dataset_size
is the size of the dataset. Seedataset_size
.
Returns
The mix digest generated from the header hash and propagated nonce.
The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.
def hashimoto_light(header_hash: Hash32, nonce: Bytes8, cache: Tuple[Tuple[U32, ...], ...], dataset_size: Uint) -> Tuple[bytes, Hash32]:
402 | """ |
---|---|
403 | Run the [`hashimoto`] algorithm by generating dataset item using the cache |
404 | instead of loading the full dataset into main memory. |
405 |
|
406 | #### Parameters |
407 |
|
408 | - `header_hash` is a valid [RLP hash] of a block header. |
409 | - `nonce` is the propagated nonce for the given block. |
410 | - `cache` is the cache generated by [`generate_cache`]. |
411 | - `dataset_size` is the size of the dataset. See [`dataset_size`]. |
412 |
|
413 | #### Returns |
414 |
|
415 | - The mix digest generated from the header hash and propagated nonce. |
416 | - The final result obtained which will be checked for leading zeros (in |
417 | byte representation) in correspondence with the block difficulty. |
418 |
|
419 | [RLP hash]: ref:ethereum.rlp.rlp_hash |
420 | [`dataset_size`]: ref:ethereum.ethash.dataset_size |
421 | [`generate_cache`]: ref:ethereum.ethash.generate_cache |
422 | [`hashimoto`]: ref:ethereum.ethash.hashimoto |
423 | """ |
424 | |
425 | def fetch_dataset_item(index: Uint) -> Tuple[U32, ...]: |
426 | item: Hash64 = generate_dataset_item(cache, index) |
427 | return le_bytes_to_uint32_sequence(item) |
428 | |
429 | return hashimoto(header_hash, nonce, dataset_size, fetch_dataset_item) |