Skip to content

Writing a New Test

Test Functions

Every test case is defined as a Python function that implements a single StateTest or BlockchainTest using the state_test or blockchain_test objects made available by the framework (learn how to decide on a test type). Test cases, and the respective test modules, must fulfill the following requirements:

Requirement When
Be decorated with validity markers If the test case is not valid for all forks
Use one of state_test or blockchain_test in its function arguments Always
Call the state_test or blockchain_test in its test body Always
Add a reference version of the EIP spec under test Test path contains eip

Specifying which Forks Tests are Valid For

Test cases can (and it most cases should) be decorated with one or more "validity markers" that define which the forks the test is valid for. This is achieved by applying:

  • pytest.mark.valid_from(FORK) and/or pytest.mark.valid_until(FORK)

or

  • pytest.mark.valid_at_transition_to(FORK)

markers on either the test function, test class or test module level:

import pytest

@pytest.mark.valid_from("Berlin")
@pytest.mark.valid_until("London")
def test_access_list(state_test: StateTestFiller, fork: Fork):
import pytest


@pytest.mark.valid_from("Shanghai")
class TestMultipleWithdrawalsSameAddress:
import pytest

pytestmark = pytest.mark.valid_from("Shanghai")

The ethereum_test_forks package defines the available forks and provides the following helpers that return all forks within the specified range:

The state_test and blockchain_test Test Function Arguments

The test function's signature must contain exactly one of either a state_test or blockchain_test argument.

For example, for state tests:

def test_access_list(state_test: StateTestFiller):

and for blockchain tests:

def test_contract_creating_tx(
    blockchain_test: BlockchainTestFiller, fork: Fork, initcode: Initcode
):

The state_test and blockchain_test objects are actually wrapper classes to the StateTest, respectively BlockchainTest objects, that once called actually instantiate a new instance of these objects and fill the test case using the evm tool according to the pre and post states and the transactions defined within the test.

If a blockchain-type test should only generate a test fixture in the Engine format (EngineFixture), the blockchain_test_engine object can be specified. This object is a wrapper for the BlockchainTestEngine class.

StateTest Object

The StateTest object represents a single test vector, and contains the following attributes:

  • env: Environment object which describes the global state of the blockchain before the test starts.
  • pre: Pre-State containing the information of all Ethereum accounts that exist before any transaction is executed.
  • post: Post-State containing the information of all Ethereum accounts that are created or modified after all transactions are executed.
  • txs: All transactions to be executed during test execution.

BlockchainTest Object

The BlockchainTest object represents a single test vector that evaluates the Ethereum VM by attempting to append multiple blocks to the chain:

  • pre: Pre-State containing the information of all Ethereum accounts that exist before any block is executed.
  • post: Post-State containing the information of all Ethereum accounts that are created or modified after all blocks are executed.
  • blocks: All blocks to be appended to the blockchain during the test.

BlockchainTestEngine Object

The BlockchainTestEngine object has the same properties as the BlockchainTest but it's used to only generate a blockchain test in the Engine format.

Pre/Post State of the Test

The pre and post states are elemental to setup and then verify the outcome of the state test.

Both pre and post are mappings of account addresses to account structures (see more info).

A single test vector can contain as many accounts in the pre and post states as required, and they can be also filled dynamically.

storage of an account is a key/value dictionary, and its values are integers within range of [0, 2**256 - 1].

txs are the steps which transform the pre-state into the post-state and must perform specific actions within the accounts (smart contracts) that result in verifiable changes to the balance, nonce, and/or storage in each of them.

post is compared against the outcome of the client after the execution of each transaction, and any differences are considered a failure

When designing a test, all the changes must be ideally saved into the contract's storage to be able to verify them in the post-state.

Test Transactions

Transactions can be crafted by sending them with specific data or to a specific account, which contains the code to be executed.

Transactions can also create more accounts, by setting the to field to an empty string.

Transactions can be designed to fail, and a verification must be made that the transaction fails with the specific error that matches what is expected by the test.

They can also contain a TransactionReceipt object in field expected_receipt which allows checking for an exact gas_used value.

Writing code for the accounts in the test

Account bytecode can be embedded in the test accounts by adding it to the code field of the account object, or the data field of the tx object if the bytecode is meant to be treated as init code or call data.

The code can be in either of the following formats:

  • bytes object, representing the raw opcodes in binary format.
  • str, representing an hexadecimal format of the opcodes.
  • Code compilable object.

Currently supported built-in compilable objects are:

Code objects can be concatenated together by using the + operator.

Verifying the Accounts' Post States

The state of the accounts after all blocks/transactions have been executed is the way of verifying that the execution client actually behaves like the test expects.

During their filling process, all tests automatically verify that the accounts specified in their post property actually match what was returned by the transition tool.

Within the post dictionary object, an account address can be:

  • None: The account will not be checked for absence or existence in the result returned by the transition tool.
  • Account object: The test expects that this account exists and also has properties equal to the properties specified by the Account object.
  • Account.NONEXISTENT: The test expects that this account does not exist in the result returned by the transition tool, and if the account exists, it results in error. E.g. when the transaction creating a contract is expected to fail and the test wants to verify that the address where the contract was supposed to be created is indeed empty.

The Account object

The Account object is used to specify the properties of an account to be verified in the post state.

The python representation can be found in src/ethereum_test_types/types.py.

It can verify the following properties of an account:

  • nonce: the scalar value equal to a) the number of transactions sent by an Externally Owned Account, b) the amount of contracts created by a contract.

  • balance: the amount of Wei (10-18 Eth) the account has.

  • code: Bytecode contained by the account. To verify that an account contains no code, this property needs to be set to "0x" or "".

It is not recommended to verify Yul compiled code in the output account, because the bytecode can change from version to version.

  • storage: Storage within the account represented as a dict object. All storage keys that are expected to be set must be specified, and if a key is skipped, it is implied that its expected value is zero. Setting this property to {} (empty dict), means that all the keys in the account must be unset (equal to zero).

All account's properties are optional, and they can be skipped or set to None, which means that no check will be performed on that specific account property.

Verifying correctness of the new test

A well written test performs a single verification output at a time.

A verification output can be a single storage slot, the balance of an account, or a newly created contract.

It is not recommended to use balance changes to verify test correctness, as it can be easily affected by gas cost changes in future EIPs.

The best way to verify a transaction/block execution outcome is to check its storage.

A test can be written as a negative verification. E.g. a contract is not created, or a transaction fails to execute or runs out of gas.

This kind of verification must be carefully crafted because it is possible to end up having a false positive result, which means that the test passed but the intended verification was never made.

To avoid these scenarios, it is important to have a separate verification to check that test is effective. E.g. when a transaction is supposed to fail, it is necessary to check that the failure error is actually the one expected by the test.

Failing or invalid transactions

Transactions included in a StateTest are expected to be intrinsically valid, i.e. the account sending the transaction must have enough funds to cover the gas costs, the max fee of the transaction must be equal or higher than the base fee of the block, etc.

An intrinsically valid transaction can still revert during its execution.

Blocks in a BlockchainTest can contain intrinsically invalid transactions but in this case the block is expected to be completely rejected, along with all transactions in it, including other valid transactions.

Parametrizing tests

Tests can be parametrized by using the @pytest.mark.parametrize decorator.

Example:

import pytest

@pytest.mark.parametrize(
    "tx_value,expected_balance",
    [
        pytest.param(0, 0, id="zero-value"),
        pytest.param(100, 100, id="non-zero-value"),
    ],
)
def test_contract_creating_tx(
    blockchain_test: BlockchainTestFiller, fork: Fork, tx_value: int, expected_balance: int
):

This will run the test twice, once with tx_value set to 0 and expected_balance set to 0, and once with tx_value set to 100 and expected_balance set to 100.

The fork fixture is automatically provided by the framework and contains the current fork under test, and does not need to be parametrized.

Tests can also be automatically parametrized with appropriate fork covariant values using the with_all_* markers listed in the Test Markers page.

The extend_with_defaults Utility

Extend test cases with default parameter values.

This utility function extends test case parameters by adding default values from the defaults dictionary to each case in the cases list. If a case already specifies a value for a parameter, its default is ignored.

This function is particularly useful in scenarios where you want to define a common set of default values but allow individual test cases to override them as needed.

The function returns a dictionary that can be directly unpacked and passed to the @pytest.mark.parametrize decorator.

Parameters:

Name Type Description Default
defaults Dict[str, Any]

A dictionary of default parameter names and their values. These values will be added to each case unless the case already defines a value for each parameter.

required
cases List[ParameterSet]

A list of pytest.param objects representing different test cases. Its first argument must be a dictionary defining parameter names and values.

required
parametrize_kwargs Any

Additional keyword arguments to be passed to @pytest.mark.parametrize. These arguments are not modified by this function and are passed through unchanged.

{}

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: A dictionary with the following structure: argnames: A list of parameter names. argvalues: A list of test cases with modified parameter values. parametrize_kwargs: Additional keyword arguments passed through unchanged.

Example
@pytest.mark.parametrize(**extend_with_defaults(
    defaults=dict(
        min_value=0,  # default minimum value is 0
        max_value=100,  # default maximum value is 100
        average=50,  # default average value is 50
    ),
    cases=[
        pytest.param(
            dict(),  # use default values
            id='default_case',
        ),
        pytest.param(
            dict(min_value=10),  # override with min_value=10
            id='min_value_10',
        ),
        pytest.param(
            dict(max_value=200),  # override with max_value=200
            id='max_value_200',
        ),
        pytest.param(
            dict(min_value=-10, max_value=50),  # override both min_value
            # and max_value
            id='min_-10_max_50',
        ),
        pytest.param(
            dict(min_value=20, max_value=80, average=50),  # all defaults
            # are overridden
            id="min_20_max_80_avg_50",
        ),
        pytest.param(
            dict(min_value=100, max_value=0),  # invalid range
            id='invalid_range',
            marks=pytest.mark.xfail(reason='invalid range'),
        )
    ],
))
def test_range(min_value, max_value, average):
    assert min_value <= max_value
    assert min_value <= average <= max_value

The above test will execute with the following sets of parameters:

"default_case": {"min_value": 0, "max_value": 100, "average": 50}
"min_value_10": {"min_value": 10, "max_value": 100, "average": 50}
"max_value_200": {"min_value": 0, "max_value": 200, "average": 50}
"min_-10_max_50": {"min_value": -10, "max_value": 50, "average": 50}
"min_20_max_80_avg_50": {"min_value": 20, "max_value": 80, "average": 50}
"invalid_range": {"min_value": 100, "max_value": 0, "average": 50}  # expected to fail
Notes
  • Each case in cases must contain exactly one value, which is a dictionary of parameter values.
  • The function performs an in-place update of the cases list, so the original cases list is modified.
Source code in src/ethereum_test_tools/utility/pytest.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def extend_with_defaults(
    defaults: Dict[str, Any], cases: List[ParameterSet], **parametrize_kwargs: Any
) -> Dict[str, Any]:
    """
    Extend test cases with default parameter values.

    This utility function extends test case parameters by adding default values
    from the `defaults` dictionary to each case in the `cases` list. If a case
    already specifies a value for a parameter, its default is ignored.

    This function is particularly useful in scenarios where you want to define
    a common set of default values but allow individual test cases to override
    them as needed.

    The function returns a dictionary that can be directly unpacked and passed
    to the `@pytest.mark.parametrize` decorator.

    Args:
        defaults (Dict[str, Any]): A dictionary of default parameter names and
            their values. These values will be added to each case unless the case
            already defines a value for each parameter.
        cases (List[ParameterSet]): A list of `pytest.param` objects representing
            different test cases. Its first argument must be a dictionary defining
            parameter names and values.
        parametrize_kwargs (Any): Additional keyword arguments to be passed to
            `@pytest.mark.parametrize`. These arguments are not modified by this
            function and are passed through unchanged.

    Returns:
        Dict[str, Any]: A dictionary with the following structure:
            `argnames`: A list of parameter names.
            `argvalues`: A list of test cases with modified parameter values.
            `parametrize_kwargs`: Additional keyword arguments passed through unchanged.


    Example:
        ```python
        @pytest.mark.parametrize(**extend_with_defaults(
            defaults=dict(
                min_value=0,  # default minimum value is 0
                max_value=100,  # default maximum value is 100
                average=50,  # default average value is 50
            ),
            cases=[
                pytest.param(
                    dict(),  # use default values
                    id='default_case',
                ),
                pytest.param(
                    dict(min_value=10),  # override with min_value=10
                    id='min_value_10',
                ),
                pytest.param(
                    dict(max_value=200),  # override with max_value=200
                    id='max_value_200',
                ),
                pytest.param(
                    dict(min_value=-10, max_value=50),  # override both min_value
                    # and max_value
                    id='min_-10_max_50',
                ),
                pytest.param(
                    dict(min_value=20, max_value=80, average=50),  # all defaults
                    # are overridden
                    id="min_20_max_80_avg_50",
                ),
                pytest.param(
                    dict(min_value=100, max_value=0),  # invalid range
                    id='invalid_range',
                    marks=pytest.mark.xfail(reason='invalid range'),
                )
            ],
        ))
        def test_range(min_value, max_value, average):
            assert min_value <= max_value
            assert min_value <= average <= max_value
        ```

    The above test will execute with the following sets of parameters:

    ```python
    "default_case": {"min_value": 0, "max_value": 100, "average": 50}
    "min_value_10": {"min_value": 10, "max_value": 100, "average": 50}
    "max_value_200": {"min_value": 0, "max_value": 200, "average": 50}
    "min_-10_max_50": {"min_value": -10, "max_value": 50, "average": 50}
    "min_20_max_80_avg_50": {"min_value": 20, "max_value": 80, "average": 50}
    "invalid_range": {"min_value": 100, "max_value": 0, "average": 50}  # expected to fail
    ```

    Notes:
        - Each case in `cases` must contain exactly one value, which is a dictionary
          of parameter values.
        - The function performs an in-place update of the `cases` list, so the
          original `cases` list is modified.

    """
    for i, case in enumerate(cases):
        if not (len(case.values) == 1 and isinstance(case.values[0], dict)):
            raise ValueError(
                "each case must contain exactly one value; a dict of parameter values"
            )
        if set(case.values[0].keys()) - set(defaults.keys()):
            raise UnknownParameterInCasesError()
        # Overwrite values in defaults if the parameter is present in the test case values
        merged_params = {**defaults, **case.values[0]}  # type: ignore
        cases[i] = pytest.param(*merged_params.values(), id=case.id, marks=case.marks)

    return {"argnames": list(defaults), "argvalues": cases, **parametrize_kwargs}