In this notebook we explore data from the Medalla testnet. We are looking at the 388001 first slots.
We present the main datasets below:
Each row in this dataset corresponds to an aggregate attestation included in a block.
We cast the dataset above into a long format, such that each row corresponds to an individual attestation included in a block. Note that when this individual attestation is included multiple times over multiple aggregates, it appears multiple times in the dataset.
exploded_ats is the “disaggregated” version of the aggregate attestations. To check for validator performance, we often don’t need to check for every inclusion of their individual attestations.
individual_ats contains these unique, individual attestations, tagged with some extra data such as their earliest inclusion and whether they attested correctly for the target checkpoint and the head.
In a previous version of this notebook, we’ve made use of Jim McDonald’s treasure trove of data, posted on the #medalla-data-challenge channel of the EthStaker Discord server. The data was obtained from his chaind tool, which listens for a beacon node and outputs its data to a PostgreSQL database. It’s a great tool and the data is super useful.
We’ve since adopted a different approach, getting the data “stateful” data (committees, validator balances…) from a request to our node. By setting Lighthouse to record the state often enough, these requests are fast to execute.
Committees are groups of validators asked to produce an attestation for a specific slot. An active validator is a member of exactly one committee per epoch.
This dataset gives us validator state balances at the beginning of each epoch. Note that the state balance (
balance), the true ETH amount a validator deposited, is different from the effective balance (
effective_balance), which measures the principal on which validators receive an interest.
To ease the computational demands of this notebook, we record two datasets from which much of the analysis can be derived.
For each validator, we compute a bunch of statistics, including:
included_ats: The number of times their attestations were included
last_att: The attesting slot of their earliest and latest attestation (used by pintail to build validator types)
correct_heads: How many times they correctly attested for the target checkpoint or the head
avg_delay: Their average inclusion delay
We also record summary statistics for each slot. At 388000 slots in our dataset, this remains manageable to query. We have the following fields:
included_ats: How many attestations were received for the slot.
expected_ats: How many attestations were expected for the slot.
correct_heads: The number of correct target/head attestations for that slot.
We compare the number of included attestations with the number of expected attestations.
Clearly something went very wrong circa epoch 2,500. This is now known as the roughtime incident, an issue affecting the major validator client, Prysm. It took time for the network to recover, in the process demonstrating how the quadratic inactivity leak mechanism works. Client diversity FTW!
How many blocks are there in the canonical chain?
Again, the same trough during the roughtime incident.
Attestations vouch for some target checkpoint to justify. We can check whether they vouched for the correct one by comparing their
target_block_root with the latest known block root as of the start of the attestation epoch (that’s a mouthful). How many individual attestations correctly attest for the target?
How does the correctness evolve over time?