ethereum.ethash

Ethash is a proof-of-work algorithm designed to be ASIC resistant through memory hardness.

To achieve memory hardness, computing Ethash requires access to subsets of a large structure. The particular subsets chosen are based on the nonce and block header, while the set itself is changed every epoch.

At a high level, the Ethash algorithm is as follows:

  1. Create a seed value, generated with generate_seed and based on the preceding block numbers.

  2. From the seed, compute a pseudorandom cache with generate_cache.

  3. From the cache, generate a dataset with generate_dataset. The dataset grows over time based on DATASET_EPOCH_GROWTH_SIZE.

  4. Miners hash slices of the dataset together, which is where the memory hardness is introduced. Verification of the proof-of-work only requires the cache to be able to recompute a much smaller subset of the full dataset.

EPOCH_SIZE

Number of blocks before a dataset needs to be regenerated (known as an "epoch".) See epoch.

40
EPOCH_SIZE = 30000

INITIAL_CACHE_SIZE

Size of the cache (in bytes) during the first epoch. Each subsequent epoch's cache roughly grows by CACHE_EPOCH_GROWTH_SIZE bytes. See cache_size.

48
INITIAL_CACHE_SIZE = 2**24

CACHE_EPOCH_GROWTH_SIZE

After the first epoch, the cache size grows by roughly this amount. See cache_size.

57
CACHE_EPOCH_GROWTH_SIZE = 2**17

INITIAL_DATASET_SIZE

Size of the dataset (in bytes) during the first epoch. Each subsequent epoch's dataset roughly grows by DATASET_EPOCH_GROWTH_SIZE bytes. See dataset_size.

65
INITIAL_DATASET_SIZE = 2**30

DATASET_EPOCH_GROWTH_SIZE

After the first epoch, the dataset size grows by roughly this amount. See dataset_size.

75
DATASET_EPOCH_GROWTH_SIZE = 2**23

HASH_BYTES

Length of a hash, in bytes.

83
HASH_BYTES = 64

MIX_BYTES

Width of mix, in bytes. See generate_dataset_item.

88
MIX_BYTES = 128

CACHE_ROUNDS

Number of times to repeat the keccak512 step while generating the hash. See generate_cache.

95
CACHE_ROUNDS = 3

DATASET_PARENTS

Number of parents of each dataset element. See generate_dataset_item.

104
DATASET_PARENTS = 256

HASHIMOTO_ACCESSES

Number of accesses in the hashimoto loop.

111
HASHIMOTO_ACCESSES = 64

epoch

Obtain the epoch number to which the block identified by block_number belongs. The first epoch is numbered zero.

An Ethash epoch is a fixed number of blocks (EPOCH_SIZE) long, during which the dataset remains constant. At the end of each epoch, the dataset is generated anew. See generate_dataset.

def epoch(block_number: Uint) -> Uint:
120
    """
121
    Obtain the epoch number to which the block identified by `block_number`
122
    belongs. The first epoch is numbered zero.
123
124
    An Ethash epoch is a fixed number of blocks ([`EPOCH_SIZE`]) long, during
125
    which the dataset remains constant. At the end of each epoch, the dataset
126
    is generated anew. See [`generate_dataset`].
127
128
    [`EPOCH_SIZE`]: ref:ethereum.ethash.EPOCH_SIZE
129
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
130
    """
131
    return block_number // EPOCH_SIZE

cache_size

Obtain the cache size (in bytes) of the epoch to which block_number belongs.

See INITIAL_CACHE_SIZE and CACHE_EPOCH_GROWTH_SIZE for the initial size and linear growth rate, respectively. The cache is generated in generate_cache.

The actual cache size is smaller than simply multiplying CACHE_EPOCH_GROWTH_SIZE by the epoch number to minimize the risk of unintended cyclic behavior. It is defined as the highest prime number below what linear growth would calculate.

def cache_size(block_number: Uint) -> Uint:
135
    """
136
    Obtain the cache size (in bytes) of the epoch to which `block_number`
137
    belongs.
138
139
    See [`INITIAL_CACHE_SIZE`] and [`CACHE_EPOCH_GROWTH_SIZE`] for the initial
140
    size and linear growth rate, respectively. The cache is generated in
141
    [`generate_cache`].
142
143
    The actual cache size is smaller than simply multiplying
144
    `CACHE_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of
145
    unintended cyclic behavior. It is defined as the highest prime number below
146
    what linear growth would calculate.
147
148
    [`INITIAL_CACHE_SIZE`]: ref:ethereum.ethash.INITIAL_CACHE_SIZE
149
    [`CACHE_EPOCH_GROWTH_SIZE`]: ref:ethereum.ethash.CACHE_EPOCH_GROWTH_SIZE
150
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
151
    """
152
    size = INITIAL_CACHE_SIZE + (CACHE_EPOCH_GROWTH_SIZE * epoch(block_number))
153
    size -= HASH_BYTES
154
    while not is_prime(size // HASH_BYTES):
155
        size -= 2 * HASH_BYTES
156
157
    return size

dataset_size

Obtain the dataset size (in bytes) of the epoch to which block_number belongs.

See INITIAL_DATASET_SIZE and DATASET_EPOCH_GROWTH_SIZE for the initial size and linear growth rate, respectively. The complete dataset is generated in generate_dataset, while the slices used in verification are generated in generate_dataset_item.

The actual dataset size is smaller than simply multiplying DATASET_EPOCH_GROWTH_SIZE by the epoch number to minimize the risk of unintended cyclic behavior. It is defined as the highest prime number below what linear growth would calculate.

def dataset_size(block_number: Uint) -> Uint:
161
    """
162
    Obtain the dataset size (in bytes) of the epoch to which `block_number`
163
    belongs.
164
165
    See [`INITIAL_DATASET_SIZE`] and [`DATASET_EPOCH_GROWTH_SIZE`][ds] for the
166
    initial size and linear growth rate, respectively. The complete dataset is
167
    generated in [`generate_dataset`], while the slices used in verification
168
    are generated in [`generate_dataset_item`].
169
170
    The actual dataset size is smaller than simply multiplying
171
    `DATASET_EPOCH_GROWTH_SIZE` by the epoch number to minimize the risk of
172
    unintended cyclic behavior. It is defined as the highest prime number below
173
    what linear growth would calculate.
174
175
    [`INITIAL_DATASET_SIZE`]: ref:ethereum.ethash.INITIAL_DATASET_SIZE
176
    [ds]: ref:ethereum.ethash.DATASET_EPOCH_GROWTH_SIZE
177
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
178
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
179
    """
180
    size = INITIAL_DATASET_SIZE + (
181
        DATASET_EPOCH_GROWTH_SIZE * epoch(block_number)
182
    )
183
    size -= MIX_BYTES
184
    while not is_prime(size // MIX_BYTES):
185
        size -= 2 * MIX_BYTES
186
187
    return size

generate_seed

Obtain the cache generation seed for the block identified by block_number. See generate_cache.

def generate_seed(block_number: Uint) -> Hash32:
191
    """
192
    Obtain the cache generation seed for the block identified by
193
    `block_number`. See [`generate_cache`].
194
195
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
196
    """
197
    epoch_number = epoch(block_number)
198
199
    seed = b"\x00" * 32
200
    while epoch_number != 0:
201
        seed = keccak256(seed)
202
        epoch_number -= 1
203
204
    return Hash32(seed)

generate_cache

Generate the cache for the block identified by block_number. See generate_dataset for how the cache is used.

The cache is generated in two steps: filling the array with a chain of keccak512 hashes, then running two rounds of Sergio Demian Lerner's RandMemoHash on those bytes.

def generate_cache(block_number: Uint) -> Tuple[Tuple[U32, ...], ...]:
208
    """
209
    Generate the cache for the block identified by `block_number`. See
210
    [`generate_dataset`] for how the cache is used.
211
212
    The cache is generated in two steps: filling the array with a chain of
213
    [`keccak512`] hashes, then running two rounds of Sergio Demian Lerner's
214
    [RandMemoHash] on those bytes.
215
216
    [`keccak512`]: ref:ethereum.crypto.hash.keccak512
217
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
218
    [RandMemoHash]: http://www.hashcash.org/papers/memohash.pdf
219
    """
220
    seed = generate_seed(block_number)
221
    cache_size_bytes = cache_size(block_number)
222
223
    cache_size_words = cache_size_bytes // HASH_BYTES
224
    cache = [keccak512(seed)]
225
226
    for index in range(1, cache_size_words):
227
        cache_item = keccak512(cache[index - 1])
228
        cache.append(cache_item)
229
230
    for _ in range(CACHE_ROUNDS):
231
        for index in range(cache_size_words):
232
            # Converting `cache_size_words` to int as `-1 + Uint(5)` is an
233
            # error.
234
            first_cache_item = cache[
235
                (index - 1 + int(cache_size_words)) % cache_size_words
236
            ]
237
            second_cache_item = cache[
238
                U32.from_le_bytes(cache[index][0:4]) % cache_size_words
239
            ]
240
            result = bytes(
241
                [a ^ b for a, b in zip(first_cache_item, second_cache_item)]
242
            )
243
            cache[index] = keccak512(result)
244
245
    return tuple(
246
        le_bytes_to_uint32_sequence(cache_item) for cache_item in cache
247
    )

fnv

A non-associative substitute for XOR, inspired by the FNV hash by Fowler, Noll, and Vo. See fnv_hash, generate_dataset_item, and hashimoto.

Note that here we multiply the prime with the full 32-bit input, in contrast with the FNV-1 spec which multiplies the prime with one byte (octet) in turn.

def fnv(a: Union[Uint, U32], ​​b: Union[Uint, U32]) -> U32:
251
    """
252
    A non-associative substitute for XOR, inspired by the [FNV] hash by Fowler,
253
    Noll, and Vo. See [`fnv_hash`], [`generate_dataset_item`], and
254
    [`hashimoto`].
255
256
    Note that here we multiply the prime with the full 32-bit input, in
257
    contrast with the [FNV-1] spec which multiplies the prime with one byte
258
    (octet) in turn.
259
260
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
261
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
262
    [`fnv_hash`]: ref:ethereum.ethash.fnv_hash
263
    [FNV]: https://w.wiki/XKZ
264
    [FNV-1]: http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
265
    """
266
    # This is a faster way of doing `number % (2 ** 32)`.
267
    result = ((Uint(a) * 0x01000193) ^ Uint(b)) & U32.MAX_VALUE
268
    return U32(result)

fnv_hash

Combines data into mix_integers using fnv. See hashimoto and generate_dataset_item.

def fnv_hash(mix_integers: Tuple[U32, ...], ​​data: Tuple[U32, ...]) -> Tuple[U32, ...]:
274
    """
275
    Combines `data` into `mix_integers` using [`fnv`]. See [`hashimoto`] and
276
    [`generate_dataset_item`].
277
278
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
279
    [`generate_dataset_item`]: ref:ethereum.ethash.generate_dataset_item
280
    [`fnv`]: ref:ethereum.ethash.fnv
281
    """
282
    return tuple(
283
        fnv(mix_integers[i], data[i]) for i in range(len(mix_integers))
284
    )

generate_dataset_item

Generate a particular dataset item 0-indexed by index by hashing pseudorandomly-selected entries from cache together. See fnv and fnv_hash for the digest function, generate_cache for generating cache, and generate_dataset for the full dataset generation algorithm.

def generate_dataset_item(cache: Tuple[Tuple[U32, ...], ...], ​​index: Uint) -> Hash64:
290
    """
291
    Generate a particular dataset item 0-indexed by `index` by hashing
292
    pseudorandomly-selected entries from `cache` together. See [`fnv`] and
293
    [`fnv_hash`] for the digest function, [`generate_cache`] for generating
294
    `cache`, and [`generate_dataset`] for the full dataset generation
295
    algorithm.
296
297
    [`fnv`]: ref:ethereum.ethash.fnv
298
    [`fnv_hash`]: ref:ethereum.ethash.fnv_hash
299
    [`generate_dataset`]: ref:ethereum.ethash.generate_dataset
300
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
301
    """
302
    mix = keccak512(
303
        (
304
            le_uint32_sequence_to_uint(cache[index % len(cache)]) ^ index
305
        ).to_le_bytes(number_bytes=HASH_BYTES)
306
    )
307
308
    mix_integers = le_bytes_to_uint32_sequence(mix)
309
310
    for j in range(DATASET_PARENTS):
311
        mix_word: U32 = mix_integers[j % 16]
312
        cache_index = fnv(index ^ j, mix_word) % len(cache)
313
        parent = cache[cache_index]
314
        mix_integers = fnv_hash(mix_integers, parent)
315
316
    mix = Hash64(le_uint32_sequence_to_bytes(mix_integers))
317
318
    return keccak512(mix)

generate_dataset

Generate the full dataset for the block identified by block_number.

This function is present only for demonstration purposes. It is not used while validating blocks.

def generate_dataset(block_number: Uint) -> Tuple[Hash64, ...]:
322
    """
323
    Generate the full dataset for the block identified by `block_number`.
324
325
    This function is present only for demonstration purposes. It is not used
326
    while validating blocks.
327
    """
328
    dataset_size_bytes: Uint = dataset_size(block_number)
329
    cache: Tuple[Tuple[U32, ...], ...] = generate_cache(block_number)
330
331
    # TODO: Parallelize this later on if it adds value
332
    return tuple(
333
        generate_dataset_item(cache, Uint(index))
334
        for index in range(dataset_size_bytes // HASH_BYTES)
335
    )

hashimoto

Obtain the mix digest and the final value for a header, by aggregating data from the full dataset.

Parameters

  • header_hash is a valid RLP hash of a block header.

  • nonce is the propagated nonce for the given block.

  • dataset_size is the size of the dataset. See dataset_size.

  • fetch_dataset_item is a function that retrieves a specific dataset item based on its index.

Returns

  • The mix digest generated from the header hash and propagated nonce.

  • The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.

def hashimoto(header_hash: Hash32, ​​nonce: Bytes8, ​​dataset_size: Uint, ​​fetch_dataset_item: Callable[[Uint], Tuple[U32, ...]]) -> Tuple[bytes, Hash32]:
344
    """
345
    Obtain the mix digest and the final value for a header, by aggregating
346
    data from the full dataset.
347
348
    #### Parameters
349
350
    - `header_hash` is a valid [RLP hash] of a block header.
351
    - `nonce` is the propagated nonce for the given block.
352
    - `dataset_size` is the size of the dataset. See [`dataset_size`].
353
    - `fetch_dataset_item` is a function that retrieves a specific dataset item
354
      based on its index.
355
356
    #### Returns
357
358
    - The mix digest generated from the header hash and propagated nonce.
359
    - The final result obtained which will be checked for leading zeros (in
360
      byte representation) in correspondence with the block difficulty.
361
362
    [RLP hash]: ref:ethereum.rlp.rlp_hash
363
    [`dataset_size`]: ref:ethereum.ethash.dataset_size
364
    """
365
    nonce_le = bytes(reversed(nonce))
366
    seed_hash = keccak512(header_hash + nonce_le)
367
    seed_head = U32.from_le_bytes(seed_hash[:4])
368
369
    rows = dataset_size // 128
370
    mix = le_bytes_to_uint32_sequence(seed_hash) * (MIX_BYTES // HASH_BYTES)
371
372
    for i in range(HASHIMOTO_ACCESSES):
373
        new_data: Tuple[U32, ...] = ()
374
        parent = fnv(i ^ seed_head, mix[i % len(mix)]) % rows
375
        for j in range(MIX_BYTES // HASH_BYTES):
376
            # Typecasting `parent` from U32 to Uint as 2*parent + j may
377
            # overflow U32.
378
            new_data += fetch_dataset_item(2 * Uint(parent) + j)
379
380
        mix = fnv_hash(mix, new_data)
381
382
    compressed_mix = []
383
    for i in range(0, len(mix), 4):
384
        compressed_mix.append(
385
            fnv(fnv(fnv(mix[i], mix[i + 1]), mix[i + 2]), mix[i + 3])
386
        )
387
388
    mix_digest = le_uint32_sequence_to_bytes(compressed_mix)
389
    result = keccak256(seed_hash + mix_digest)
390
391
    return mix_digest, result

hashimoto_light

Run the hashimoto algorithm by generating dataset item using the cache instead of loading the full dataset into main memory.

Parameters

  • header_hash is a valid RLP hash of a block header.

  • nonce is the propagated nonce for the given block.

  • cache is the cache generated by generate_cache.

  • dataset_size is the size of the dataset. See dataset_size.

Returns

  • The mix digest generated from the header hash and propagated nonce.

  • The final result obtained which will be checked for leading zeros (in byte representation) in correspondence with the block difficulty.

def hashimoto_light(header_hash: Hash32, ​​nonce: Bytes8, ​​cache: Tuple[Tuple[U32, ...], ...], ​​dataset_size: Uint) -> Tuple[bytes, Hash32]:
400
    """
401
    Run the [`hashimoto`] algorithm by generating dataset item using the cache
402
    instead of loading the full dataset into main memory.
403
404
    #### Parameters
405
406
    - `header_hash` is a valid [RLP hash] of a block header.
407
    - `nonce` is the propagated nonce for the given block.
408
    - `cache` is the cache generated by [`generate_cache`].
409
    - `dataset_size` is the size of the dataset. See [`dataset_size`].
410
411
    #### Returns
412
413
    - The mix digest generated from the header hash and propagated nonce.
414
    - The final result obtained which will be checked for leading zeros (in
415
      byte representation) in correspondence with the block difficulty.
416
417
    [RLP hash]: ref:ethereum.rlp.rlp_hash
418
    [`dataset_size`]: ref:ethereum.ethash.dataset_size
419
    [`generate_cache`]: ref:ethereum.ethash.generate_cache
420
    [`hashimoto`]: ref:ethereum.ethash.hashimoto
421
    """
422
423
    def fetch_dataset_item(index: Uint) -> Tuple[U32, ...]:
424
        item: Hash64 = generate_dataset_item(cache, index)
425
        return le_bytes_to_uint32_sequence(item)
426
427
    return hashimoto(header_hash, nonce, dataset_size, fetch_dataset_item)