ethereum.ethash

Ethash is a proof-of-work algorithm designed to be ASIC resistant through memory hardness.

To achieve memory hardness, computing Ethash requires access to subsets of a large structure. The particular subsets chosen are based on the nonce and block header, while the set itself is changed every epoch.

At a high level, the Ethash algorithm is as follows:

  1. Create a seed value, generated with generate_seed and based on the preceding block numbers.

  2. From the seed, compute a pseudorandom cache with generate_cache.

  3. From the cache, generate a dataset with generate_dataset. The dataset grows over time based on DATASET_EPOCH_GROWTH_SIZE.

  4. Miners hash slices of the dataset together, which is where the memory hardness is introduced. Verification of the proof-of-work only requires the cache to be able to recompute a much smaller subset of the full dataset.

EPOCH_SIZE

Number of blocks before a dataset needs to be regenerated (known as an "epoch".) See epoch.

42
EPOCH_SIZE = Uint(30000)

INITIAL_CACHE_SIZE

Size of the cache (in bytes) during the first epoch. Each subsequent epoch's cache roughly grows by CACHE_EPOCH_GROWTH_SIZE bytes. See cache_size.

50
INITIAL_CACHE_SIZE = Uint(2**24)

CACHE_EPOCH_GROWTH_SIZE

After the first epoch, the cache size grows by roughly this amount. See cache_size.

59
CACHE_EPOCH_GROWTH_SIZE = Uint(2**17)

INITIAL_DATASET_SIZE

Size of the dataset (in bytes) during the first epoch. Each subsequent epoch's dataset roughly grows by DATASET_EPOCH_GROWTH_SIZE bytes. See dataset_size.

67
INITIAL_DATASET_SIZE = Uint(2**30)

DATASET_EPOCH_GROWTH_SIZE

After the first epoch, the dataset size grows by roughly this amount. See dataset_size.

77
DATASET_EPOCH_GROWTH_SIZE = Uint(2**23)

HASH_BYTES

Length of a hash, in bytes.

85
HASH_BYTES = Uint(64)

MIX_BYTES

Width of mix, in bytes. See generate_dataset_item.

90
MIX_BYTES = Uint(128)

CACHE_ROUNDS

Number of times to repeat the keccak512 step while generating the hash. See generate_cache.

97
CACHE_ROUNDS = 3

DATASET_PARENTS

Number of parents of each dataset element. See generate_dataset_item.

106
DATASET_PARENTS = Uint(256)

HASHIMOTO_ACCESSES

Number of accesses in the hashimoto loop.

113
HASHIMOTO_ACCESSES = 64

epoch

Obtain the epoch number to which the block identified by block_number belongs. The first epoch is numbered zero.

An Ethash epoch is a fixed number of blocks (EPOCH_SIZE) long, during which the dataset remains constant. At the end of each epoch, the dataset is generated anew. See generate_dataset.

def epoch(block_number: Uint) -> Uint:
122
    """
123
    Obtain the epoch number to which the block identified by `block_number`
124
    belongs. The first epoch is numbered zero.
125
126
    An Ethash epoch is a fixed number of blocks ([`EPOCH_SIZE`]) long, during
127
    which the dataset remains constant. At the end of each epoch, the dataset
128
    is generated anew. See [`generate_dataset`].
129
130
    [`EPOCH_SIZE`]: ref:ethereum.ethash.EPOCH_SIZE
131
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
132
    """
133
    return block_number // EPOCH_SIZE

cache_size

Obtain the cache size (in bytes) of the epoch to which block_number belongs.

See INITIAL_CACHE_SIZE and CACHE_EPOCH_GROWTH_SIZE for the initial size and linear growth rate, respectively. The cache is generated in generate_cache.

The actual cache size is smaller than simply multiplying CACHE_EPOCH_GROWTH_SIZE by the epoch number to minimize the risk of unintended cyclic behavior. It is defined as the highest prime number below what linear growth would calculate.

def cache_size(block_number: Uint) -> Uint:
137
    """
138
    Obtain the cache size (in bytes) of the epoch to which `block_number`
139
    belongs.
140
141
    See [`INITIAL_CACHE_SIZE`] and [`CACHE_EPOCH_GROWTH_SIZE`] for the initial
142
    size and linear growth rate, respectively. The cache is generated in
143
    [`generate_cache`].
144
145
    The actual cache size is smaller than simply multiplying
146
    `CACHE_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of
147
    unintended cyclic behavior. It is defined as the highest prime number below
148
    what linear growth would calculate.
149
150
    [`INITIAL_CACHE_SIZE`]: ref:ethereum.ethash.INITIAL_CACHE_SIZE
151
    [`CACHE_EPOCH_GROWTH_SIZE`]: ref:ethereum.ethash.CACHE_EPOCH_GROWTH_SIZE
152
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
153
    """
154
    size = INITIAL_CACHE_SIZE + (CACHE_EPOCH_GROWTH_SIZE * epoch(block_number))
155
    size -= HASH_BYTES
156
    while not is_prime(size // HASH_BYTES):
157
        size -= Uint(2) * HASH_BYTES
158
159
    return size

dataset_size

Obtain the dataset size (in bytes) of the epoch to which block_number belongs.

See INITIAL_DATASET_SIZE and DATASET_EPOCH_GROWTH_SIZE for the initial size and linear growth rate, respectively. The complete dataset is generated in generate_dataset, while the slices used in verification are generated in generate_dataset_item.

The actual dataset size is smaller than simply multiplying DATASET_EPOCH_GROWTH_SIZE by the epoch number to minimize the risk of unintended cyclic behavior. It is defined as the highest prime number below what linear growth would calculate.

def dataset_size(block_number: Uint) -> Uint:
163
    """
164
    Obtain the dataset size (in bytes) of the epoch to which `block_number`
165
    belongs.
166
167
    See [`INITIAL_DATASET_SIZE`] and [`DATASET_EPOCH_GROWTH_SIZE`][ds] for the
168
    initial size and linear growth rate, respectively. The complete dataset is
169
    generated in [`generate_dataset`], while the slices used in verification
170
    are generated in [`generate_dataset_item`].
171
172
    The actual dataset size is smaller than simply multiplying
173
    `DATASET_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of
174
    unintended cyclic behavior. It is defined as the highest prime number below
175
    what linear growth would calculate.
176
177
    [`INITIAL_DATASET_SIZE`]: ref:ethereum.ethash.INITIAL_DATASET_SIZE
178
    [ds]: ref:ethereum.ethash.DATASET_EPOCH_GROWTH_SIZE
179
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
180
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
181
    """
182
    size = INITIAL_DATASET_SIZE + (
183
        DATASET_EPOCH_GROWTH_SIZE * epoch(block_number)
184
    )
185
    size -= MIX_BYTES
186
    while not is_prime(size // MIX_BYTES):
187
        size -= Uint(2) * MIX_BYTES
188
189
    return size

generate_seed

Obtain the cache generation seed for the block identified by block_number. See generate_cache.

def generate_seed(block_number: Uint) -> Hash32:
193
    """
194
    Obtain the cache generation seed for the block identified by
195
    `block_number`. See [`generate_cache`].
196
197
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
198
    """
199
    epoch_number = epoch(block_number)
200
201
    seed = b"\x00" * 32
202
    while epoch_number != 0:
203
        seed = keccak256(seed)
204
        epoch_number -= Uint(1)
205
206
    return Hash32(seed)

generate_cache

Generate the cache for the block identified by block_number. See generate_dataset for how the cache is used.

The cache is generated in two steps: filling the array with a chain of keccak512 hashes, then running two rounds of Sergio Demian Lerner's RandMemoHash on those bytes.

def generate_cache(block_number: Uint) -> Tuple[Tuple[U32, ...], ...]:
210
    """
211
    Generate the cache for the block identified by `block_number`. See
212
    [`generate_dataset`] for how the cache is used.
213
214
    The cache is generated in two steps: filling the array with a chain of
215
    [`keccak512`] hashes, then running two rounds of Sergio Demian Lerner's
216
    [RandMemoHash] on those bytes.
217
218
    [`keccak512`]: ref:ethereum.crypto.hash.keccak512
219
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
220
    [RandMemoHash]: http://www.hashcash.org/papers/memohash.pdf
221
    """
222
    seed = generate_seed(block_number)
223
    cache_size_bytes = cache_size(block_number)
224
225
    cache_size_words = cache_size_bytes // HASH_BYTES
226
    cache = [keccak512(seed)]
227
228
    for index in range(1, cache_size_words):
229
        cache_item = keccak512(cache[index - 1])
230
        cache.append(cache_item)
231
232
    for _ in range(CACHE_ROUNDS):
233
        for index in range(cache_size_words):
234
            # Converting `cache_size_words` to int as `-1 + Uint(5)` is an
235
            # error.
236
            first_cache_item = cache[
237
                (index - 1 + int(cache_size_words)) % int(cache_size_words)
238
            ]
239
            second_cache_item = cache[
240
                U32.from_le_bytes(cache[index][0:4]) % U32(cache_size_words)
241
            ]
242
            result = bytes(
243
                [a ^ b for a, b in zip(first_cache_item, second_cache_item)]
244
            )
245
            cache[index] = keccak512(result)
246
247
    return tuple(
248
        le_bytes_to_uint32_sequence(cache_item) for cache_item in cache
249
    )

fnv

A non-associative substitute for XOR, inspired by the FNV hash by Fowler, Noll, and Vo. See fnv_hash, generate_dataset_item, and hashimoto.

Note that here we multiply the prime with the full 32-bit input, in contrast with the FNV-1 spec which multiplies the prime with one byte (octet) in turn.

def fnv(a: Union[Uint, U32], ​​b: Union[Uint, U32]) -> U32:
253
    """
254
    A non-associative substitute for XOR, inspired by the [FNV] hash by Fowler,
255
    Noll, and Vo. See [`fnv_hash`], [`generate_dataset_item`], and
256
    [`hashimoto`].
257
258
    Note that here we multiply the prime with the full 32-bit input, in
259
    contrast with the [FNV-1] spec which multiplies the prime with one byte
260
    (octet) in turn.
261
262
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
263
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
264
    [`fnv_hash`]: ref:ethereum.ethash.fnv_hash
265
    [FNV]: https://w.wiki/XKZ
266
    [FNV-1]: http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
267
    """
268
    # This is a faster way of doing `number % (2 ** 32)`.
269
    result = ((Uint(a) * Uint(0x01000193)) ^ Uint(b)) & Uint(U32.MAX_VALUE)
270
    return U32(result)

fnv_hash

Combines data into mix_integers using fnv. See hashimoto and generate_dataset_item.

def fnv_hash(mix_integers: Tuple[U32, ...], ​​data: Tuple[U32, ...]) -> Tuple[U32, ...]:
276
    """
277
    Combines `data` into `mix_integers` using [`fnv`]. See [`hashimoto`] and
278
    [`generate_dataset_item`].
279
280
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
281
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
282
    [`fnv`]: ref:ethereum.ethash.fnv
283
    """
284
    return tuple(
285
        fnv(mix_integers[i], data[i]) for i in range(len(mix_integers))
286
    )

generate_dataset_item

Generate a particular dataset item 0-indexed by index by hashing pseudorandomly-selected entries from cache together. See fnv and fnv_hash for the digest function, generate_cache for generating cache, and generate_dataset for the full dataset generation algorithm.

def generate_dataset_item(cache: Tuple[Tuple[U32, ...], ...], ​​index: Uint) -> Hash64:
292
    """
293
    Generate a particular dataset item 0-indexed by `index` by hashing
294
    pseudorandomly-selected entries from `cache` together. See [`fnv`] and
295
    [`fnv_hash`] for the digest function, [`generate_cache`] for generating
296
    `cache`, and [`generate_dataset`] for the full dataset generation
297
    algorithm.
298
299
    [`fnv`]: ref:ethereum.ethash.fnv
300
    [`fnv_hash`]: ref:ethereum.ethash.fnv_hash
301
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
302
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
303
    """
304
    mix = keccak512(
305
        (
306
            le_uint32_sequence_to_uint(cache[index % ulen(cache)]) ^ index
307
        ).to_le_bytes64()
308
    )
309
310
    mix_integers = le_bytes_to_uint32_sequence(mix)
311
312
    for j in (Uint(k) for k in range(DATASET_PARENTS)):
313
        mix_word: U32 = mix_integers[j % Uint(16)]
314
        cache_index = fnv(index ^ j, mix_word) % U32(len(cache))
315
        parent = cache[cache_index]
316
        mix_integers = fnv_hash(mix_integers, parent)
317
318
    mix = Hash64(le_uint32_sequence_to_bytes(mix_integers))
319
320
    return keccak512(mix)

generate_dataset

Generate the full dataset for the block identified by block_number.

This function is present only for demonstration purposes. It is not used while validating blocks.

def generate_dataset(block_number: Uint) -> Tuple[Hash64, ...]:
324
    """
325
    Generate the full dataset for the block identified by `block_number`.
326
327
    This function is present only for demonstration purposes. It is not used
328
    while validating blocks.
329
    """
330
    dataset_size_bytes: Uint = dataset_size(block_number)
331
    cache: Tuple[Tuple[U32, ...], ...] = generate_cache(block_number)
332
333
    # TODO: Parallelize this later on if it adds value
334
    return tuple(
335
        generate_dataset_item(cache, Uint(index))
336
        for index in range(dataset_size_bytes // HASH_BYTES)
337
    )

hashimoto

Obtain the mix digest and the final value for a header, by aggregating data from the full dataset.

Parameters

  • header_hash is a valid RLP hash of a block header.

  • nonce is the propagated nonce for the given block.

  • dataset_size is the size of the dataset. See dataset_size.

  • fetch_dataset_item is a function that retrieves a specific dataset item based on its index.

Returns

  • The mix digest generated from the header hash and propagated nonce.

  • The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.

def hashimoto(header_hash: Hash32, ​​nonce: Bytes8, ​​dataset_size: Uint, ​​fetch_dataset_item: Callable[[Uint], Tuple[U32, ...]]) -> Tuple[bytes, Hash32]:
346
    """
347
    Obtain the mix digest and the final value for a header, by aggregating
348
    data from the full dataset.
349
350
    #### Parameters
351
352
    - `header_hash` is a valid [RLP hash] of a block header.
353
    - `nonce` is the propagated nonce for the given block.
354
    - `dataset_size` is the size of the dataset. See [`dataset_size`].
355
    - `fetch_dataset_item` is a function that retrieves a specific dataset item
356
      based on its index.
357
358
    #### Returns
359
360
    - The mix digest generated from the header hash and propagated nonce.
361
    - The final result obtained which will be checked for leading zeros (in
362
      byte representation) in correspondence with the block difficulty.
363
364
    [RLP hash]: ref:ethereum.rlp.rlp_hash
365
    [`dataset_size`]: ref:ethereum.ethash.dataset_size
366
    """
367
    nonce_le = bytes(reversed(nonce))
368
    seed_hash = keccak512(header_hash + nonce_le)
369
    seed_head = U32.from_le_bytes(seed_hash[:4])
370
371
    rows = dataset_size // Uint(128)
372
    mix = le_bytes_to_uint32_sequence(seed_hash) * (MIX_BYTES // HASH_BYTES)
373
374
    for i in range(HASHIMOTO_ACCESSES):
375
        new_data: Tuple[U32, ...] = ()
376
        parent = fnv(U32(i) ^ seed_head, mix[i % len(mix)]) % U32(rows)
377
        for j in range(MIX_BYTES // HASH_BYTES):
378
            # Typecasting `parent` from U32 to Uint as 2*parent + j may
379
            # overflow U32.
380
            new_data += fetch_dataset_item(Uint(2) * Uint(parent) + Uint(j))
381
382
        mix = fnv_hash(mix, new_data)
383
384
    compressed_mix = []
385
    for i in range(0, len(mix), 4):
386
        compressed_mix.append(
387
            fnv(fnv(fnv(mix[i], mix[i + 1]), mix[i + 2]), mix[i + 3])
388
        )
389
390
    mix_digest = le_uint32_sequence_to_bytes(compressed_mix)
391
    result = keccak256(seed_hash + mix_digest)
392
393
    return mix_digest, result

hashimoto_light

Run the hashimoto algorithm by generating dataset item using the cache instead of loading the full dataset into main memory.

Parameters

  • header_hash is a valid RLP hash of a block header.

  • nonce is the propagated nonce for the given block.

  • cache is the cache generated by generate_cache.

  • dataset_size is the size of the dataset. See dataset_size.

Returns

  • The mix digest generated from the header hash and propagated nonce.

  • The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.

def hashimoto_light(header_hash: Hash32, ​​nonce: Bytes8, ​​cache: Tuple[Tuple[U32, ...], ...], ​​dataset_size: Uint) -> Tuple[bytes, Hash32]:
402
    """
403
    Run the [`hashimoto`] algorithm by generating dataset item using the cache
404
    instead of loading the full dataset into main memory.
405
406
    #### Parameters
407
408
    - `header_hash` is a valid [RLP hash] of a block header.
409
    - `nonce` is the propagated nonce for the given block.
410
    - `cache` is the cache generated by [`generate_cache`].
411
    - `dataset_size` is the size of the dataset. See [`dataset_size`].
412
413
    #### Returns
414
415
    - The mix digest generated from the header hash and propagated nonce.
416
    - The final result obtained which will be checked for leading zeros (in
417
      byte representation) in correspondence with the block difficulty.
418
419
    [RLP hash]: ref:ethereum.rlp.rlp_hash
420
    [`dataset_size`]: ref:ethereum.ethash.dataset_size
421
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
422
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
423
    """
424
425
    def fetch_dataset_item(index: Uint) -> Tuple[U32, ...]:
426
        item: Hash64 = generate_dataset_item(cache, index)
427
        return le_bytes_to_uint32_sequence(item)
428
429
    return hashimoto(header_hash, nonce, dataset_size, fetch_dataset_item)