Overview of Testing in Tezos

Testing is important to ensure the quality of the Tezos codebase by detecting bugs and ensuring the absence of regressions. Tezos and its components use a variety of tools and framework for testing. The goal of this document is to give an overview on how testing is done in Tezos, and to help Tezos contributors use the test suite and write tests by pointing them towards the most appropriate testing framework for their use case. Finally, this guide explains how tests can be run automatically in the Tezos CI and how to measure test coverage.

The frameworks used in Tezos can be categorized along two axes: the type of component they test, and the type of test they perform. We distinguish the following components:

  • Node

    • Protocol

      • Michelson interpreter

      • Stitching

  • Networked nodes

  • Client

  • Ledger application

  • Endorser

  • Baker

Secondly, these components can be tested at different levels of granularity. Additionally, tests can verify functionality, but also other non-functional properties such as performance (execution time, memory and disk usage). We distinguish:

Unit testing

Unit testing tests software units, typically functions, in isolation.

Integration testing

Integration testing tests compositions of smaller units.

System testing

System testing tests the final binaries directly.

Regression testing

In general, regression testing aims to detect the re-introduction of previously identified bugs. It can also refer to a coarse-grained type of testing where the output of a test execution is compared to a pre-recorded log of expected output. Tezos uses tests that bugs are not re-introduced, but in this document we use regression testing to refer to the second meaning.

Property testing / Fuzzing

Both property testing and fuzzing test code with automatically generated inputs. Property testing is typically used to ensure functional correctness, and gives the user more control over generated input and the expected output. Fuzzing is typically used to search for security weaknesses and often guides input generation with the goal of increasing test coverage.

Performance testing

Testing of non-functional aspects such as run-time, memory and disk usage.

Acceptance testing

Testing of the software in real conditions. It is usually slower, more costly and less amenable to automation than integration or system testing. It is often the final step in the testing process and is performed before a release. In Tezos, acceptance testing is done by running a test net.

We obtain the following matrix. Each cell contains the frameworks appropriate for the corresponding component and testing type. The frameworks are linked to a sub-section of this page where the framework is presented in more detail, with pointers to more details.

Testing frameworks and their applications in Tezos. PT: Python testing and execution framework, AT: Alcotest, CB: Crowbar, FT: Flextesa,












– Protocol


– – Michelson interpreter






Networked nodes






Testing frameworks


Alcotest is a library for unit and integration testing in OCaml. Alcotest is the primary tool in Tezos for unit and integration for testing OCaml code.

Typical use cases:
  • Verifying simple input-output specifications for functions with a hard-coded set of input-output pairs.

  • OCaml integration tests.

Example tests:
  • Unit tests for src/lib_requester, in src/lib_requester/test/test_requester.ml. To execute them locally, run dune build @src/lib_requester/runtest in the Tezos root. To execute them on your own machine using the GitLab CI system, run gitlab-runner exec docker unit:requester.

  • Integration tests for P2P in the shell. For instance src/lib_p2p/test/test_p2p_pool.ml. This test forks a set of processes that exercise large parts of the P2P layer. To execute it locally, run dune build @runtest_p2p_pool in the Tezos root. To execute all P2P tests on your own machine using the GitLab CI system, run gitlab-runner exec docker unit:p2p.



Crowbar is a library for property-based testing in OCaml. It also interfaces with afl to enable fuzzing.

Typical use cases:
  • Verifying input-output invariants for functions with randomized inputs.

Example test:

Python testing and execution framework

The Tezos project uses pytest, a Python testing framework, combined with tezos-launchers, a Python wrapper tezos-node and tezos-client, to perform integration testing of the node, the client, networks of nodes and daemons such as the baker and endorser.

We also use pytest-regtest, a pytest plugin that enables regression testing.

Typical use cases:
  • Testing the commands of tezos-client. This allows to test the full chain: from client, to node RPC to the implementation of the economic protocol.

  • Test networks of nodes, with daemons.

  • Detecting unintended changes in the output of a component, using pytest-regtest.

Example tests:
  • Testing the node’s script interpreter through tezos-client run script (in pytest tests_python/tests/test_contract_opcodes.py). To execute it locally, run pytest tests_python/tests/test_contract_opcodes.py in the Tezos root. To execute them on your own machine using the GitLab CI system, run gitlab-runner exec docker integration:contract_opcodes.

  • Setting up networks of nodes and ensuring their connection (in tests_python/tests/test_p2p.py). To execute it locally, run pytest tests_python/tests/test_p2p.py in the Tezos root. To execute them on your own machine using the GitLab CI system, run gitlab-runner exec docker integration:p2p.

  • Detecting unintended changes in the behavior of the Michelson interpreter (in tests_python/tests/test_contract_opcodes.py). To execute it locally, run pytest tests_python/tests/test_contract_opcodes.py in the Tezos root. To execute them on your own machine using the GitLab CI system, run gitlab-runner exec docker integration:contract_opcodes.



Flextesa (Flexible Test Sandboxes) is an OCaml library for setting up configurable and scriptable sandboxes to meet specific testing needs. Flextesa can also be used for interactive tests. This is used, for instance, in some tests that require the user to interact with the Ledger application.

Typical use cases:
Example test:

Executing tests

Executing tests locally

Whereas executing the tests through the CI, as described below, is the standard and most convenient way of running the full test suite, it can also be executed locally.

Flextesa and Alcotest tests are run with make test in the project root.

The Python tests are run with make all in the directory tests_python.

Executing tests through the GitLab CI

All tests are executed on all branches for each commit. For instances, to see the latest runs of the CI on the master branch, visit this page <https://gitlab.com/tezos/tezos/-/commits/master>_. Each commit is annotated with a green checkmark icon if the CI passed, and a red cross icon if not. You can click the icon for more details.

Note that the CI does not simply execute make test and make all in the directory tests_python. Instead, it runs the tests as a set of independent jobs, to better exploit GitLab runner parallelism: one job per pytest test file and one job for each OCaml package containing tests.

When adding a new test that should be run in the CI (which should be the case for most automatic tests), you need to make sure that it is properly specified in the .gitlab-ci.yml file. The procedure for doing this depends on the type of test you’ve added:

Python integration and regression tests

Run ./scripts/update_integration_test.sh in Tezos home. This will include your new test in .gitlab-ci.yml.

Tests executed through Dune (Alcotest, Flextesa)

Run ./scripts/update_unit_test.sh in Tezos home. This will include your new test in .gitlab-ci.yml.


For other types of tests, you need to manually modify the .gitlab-ci.yml. Please refer to the GitLab CI Pipeline Reference. A helpful tool for this task is the CI linter, and gitlab-runner, introduced in the next section.

Executing the GitLab CI locally

GitLab offers the ability to run jobs defined in the .gitlab-ci.yml file on your own machine. This is helpful to debug the CI pipeline. For this, you need to setup gitlab-runner on your machine. To avoid using outdated versions of the binary, it is recommended to install a release from the development repository.

gitlab-runner works with the concept of executor. We recommend to use the docker executor to sandbox the environment the job will be executed in. This supposes that you have docker installed on your machine.

For example, if you want to run the job check_python_linting which checks the Python syntax, you can use:

gitlab-runner exec docker check_python_linting

Note that the first time you execute a job, it may take a long time because it requires downloading the docker image, and gitlab-runner is not verbose on this subject. It may be the case if the opam repository Tezos uses has been changed, requiring the refresh of the locally cached docker image.

Local changes must be committed (but not necessarily pushed remotely) before executing the job locally. Indeed, gitlab-runner will clone the head of the current local branch to execute the job.

Another limitation is that only single jobs can be executed using gitlab-runner. For instance, there is no direct way of executing all jobs in the stage test.

Measuring test coverage

We measure test coverage with bisect_ppx. This tool is used to see which lines in the code source are actually executed when running one or several tests. Importantly, it tells us which parts of the code aren’t tested.

We describe here how bisect_ppx can be used locally (code coverage isn’t integrated in the CI yet).

To install bisect_ppx. Run the following command from the root of the project directory:

make build-dev-deps

The OCaml code should be instrumented in order to generate coverage data. This has to be specified in dune files (or dune.inc for protocols) on a per-package basis by adding the following line in the library or executable stanza.

(preprocess (pps bisect_ppx -- --bisect-file /path/to/tezos.git/_coverage_output))))

At the same time, it tells bisect_ppx to generate coverage data in the _coverage_output directory. The convenience script ./scripts/instrument_dune_bisect.sh does this automatically. For instance,

./scripts/instrument_dune_bisect.sh src/lib_p2p/dune src/proto_alpha/lib_protocol/dune.inc

enables code coverage analysis for lib_p2p and proto_alpha. To instrument all the code in src/, use:

./scripts/instrument_dune_bisect.sh src/ --except "src/proto_0*"

Previous protocols (proto_0*) have to be excluded because they contain code that is not well instrumented by bisect_ppx2 and cannot be changed.

Then, compile the code using make, ignoring warnings such as .merlin generated is inaccurate. which are expected. Finally run any number of tests, and generate the HTML report from the coverage files using

make coverage-report

The generated report is available in _coverage_report/index.html. It shows for each file, which lines have been executed at least once, by at least one of the tests.

Clean up coverage data (output and report) with:

make coverage-clean

Reset the updated dune files using git. For instance:

git checkout -- src/lib_p2p/dune src/proto_alpha/lib_protocol/dune.inc


./scripts/instrument_dune_bisect.sh --remove src/

Known issues

The instrumentation by bisect_ppx of OCaml code containing partial applications with missing optional arguments generate code that fails typechecking. For those pretty rare cases, either: change the order of arguments (fun ?x y z rather than fun y ?x z), add an explicit parameter at call site (f y ?x:None rather than f y), or add a wrapper function (fun z -> f y z).


Regardless of the framework, each new test must have a comment (typically a file header comment) explaining briefly what it is testing and how.