Exploring Medalla data

Initial exploration.

Barnabé Monnot https://twitter.com/barnabemonnot (Robust Incentives Group, Ethereum Foundation)https://github.com/ethereum/rig
2020-11-06

Table of Contents


In this notebook we explore data from the Medalla testnet. We are looking at the 388001 first slots.

Data sources

Lighthouse block export

We use a fork of Lakshman Sankar’s Lighthouse block exporter to export attestations and blocks from the finalised chain until slot 388000.

We present the main datasets below:

all_ats

Each row in this dataset corresponds to an aggregate attestation included in a block.

exploded_ats

We cast the dataset above into a long format, such that each row corresponds to an individual attestation included in a block. Note that when this individual attestation is included multiple times over multiple aggregates, it appears multiple times in the dataset.

individual_ats

exploded_ats is the “disaggregated” version of the aggregate attestations. To check for validator performance, we often don’t need to check for every inclusion of their individual attestations. individual_ats contains these unique, individual attestations, tagged with some extra data such as their earliest inclusion and whether they attested correctly for the target checkpoint and the head.

Stateful data

In a previous version of this notebook, we’ve made use of Jim McDonald’s treasure trove of data, posted on the #medalla-data-challenge channel of the EthStaker Discord server. The data was obtained from his chaind tool, which listens for a beacon node and outputs its data to a PostgreSQL database. It’s a great tool and the data is super useful.

We’ve since adopted a different approach, getting the data “stateful” data (committees, validator balances…) from a request to our node. By setting Lighthouse to record the state often enough, these requests are fast to execute.

all_cms

Committees are groups of validators asked to produce an attestation for a specific slot. An active validator is a member of exactly one committee per epoch.

val_balances

This dataset gives us validator state balances at the beginning of each epoch. Note that the state balance (balance), the true ETH amount a validator deposited, is different from the effective balance (effective_balance), which measures the principal on which validators receive an interest.

Computed datasets

To ease the computational demands of this notebook, we record two datasets from which much of the analysis can be derived.

stats_per_val

For each validator, we compute a bunch of statistics, including:

stats_per_slot

We also record summary statistics for each slot. At 388000 slots in our dataset, this remains manageable to query. We have the following fields:

Performance of duties

Attester duties

We compare the number of included attestations with the number of expected attestations.

Clearly something went very wrong circa epoch 2,500. This is now known as the roughtime incident, an issue affecting the major validator client, Prysm. It took time for the network to recover, in the process demonstrating how the quadratic inactivity leak mechanism works. Client diversity FTW!

Proposer duties

How many blocks are there in the canonical chain?

Again, the same trough during the roughtime incident.

Correctness of attestations

Target checkpoint

Attestations vouch for some target checkpoint to justify. We can check whether they vouched for the correct one by comparing their target_block_root with the latest known block root as of the start of the attestation epoch (that’s a mouthful). How many individual attestations correctly attest for the target?

How does the correctness evolve over time?