E2E Test Writing Tutorial

For the tutorial we will try to test the first part of the harvest. We will try to harvest arXiv and then assert that a holdingpen entry for the harvested record appears.

Fixtures

Let’s create a test file tests/e2e/test_arxiv_in_hp.py in INSPIRE-Next. To run our tests we will need to import a few things and set up some fixtures:

import os
import pytest
import time

from inspirehep.testlib.api import InspireApiClient
from inspirehep.testlib.api.mitm_client import MITMClient, with_mitmproxy

@pytest.fixture
def inspire_client():
    # INSPIRE_API_URL is set by k8s when running the test in Jenkins
    inspire_url = os.environ.get('INSPIRE_API_URL', 'http://test-web-e2e.local:5000')
    return InspireApiClient(base_url=inspire_url)


@pytest.fixture
def mitm_client():
    mitmproxy_url = os.environ.get('MITMPROXY_HOST', 'http://mitm-manager.local')
    return MITMClient(mitmproxy_url)

InspireApiClient is used to interact with INSPIRE through the API. Using it we can for example trigger a harvest, or request holdingpen entries. MITMClient is a similar client for the proxy, with it we can swap scenarios, enable recording of interactions, or make assertions based on what happened during the test. with_mitmproxy is a helper decorator, that will automatically set up the scenario for you (scenario name will match the test name) and optionally, if you specify record=True, enable recording for the duration of the test.

We will also need the following fixture to set up all of the dummy fixtures and records in the test instance of INSPIRE. Most likely when writing a real test this fixture will already be present, as it is needed for virtually any test:

@pytest.fixture(autouse=True, scope='function')
def init_environment(inspire_client):
    inspire_client.e2e.init_db()
    inspire_client.e2e.init_es()
    inspire_client.e2e.init_fixtures()
    # refresh login session, giving a bit of time
    time.sleep(1)
    inspire_client.login_local()

Interaction Recording

Now that we have set up all of the necessary fixtures, we can attempt to start writing our test. We add a wait (for now, we will improve it later in the tutorial) at the end as to give time for INSPIRE to harvest, pull the pdf and the eprint, etc. Without this, the test would finish immediately after scheduling the crawl, which would deregister the scenario and disable recording. Later on, we will add actual polling to see if the articles were harvested.

@with_mitmproxy(should_record=True)
def test_arxiv_in_hp(inspire_client, mitm_client):
    inspire_client.e2e.schedule_crawl(
        spider='arXiv_single',
        workflow='article',
        url='http://export.arxiv.org/oai2',
        identifier='oai:arXiv.org:1806.04664',  # Non-core, will halt
    )

    time.sleep(60)  # Let's wait for INSPIRE to harvest the records

Let us now run this “test” and see what happens:

docker-compose -f docker-compose.test.yml run --rm e2e pytest tests/e2e/test_arxiv_in_hp.py

Proxy Web UI

After the test started running we can use the proxy’s web interface to look at the requests that are happening during the test session. The proxy exposes its web interface on port 8081, so open your browser and navigate to http://127.0.0.1:8081.

There you will see initial requests to RT, ElasticSearch and so on, logging in to INSPIRE. These are followed by requests to the mitm-manager.local that set up the test scenario (PUT /config) and and recording (POST /record).

After this all the requests (until disabling recording and/or switching the scenario) belong to the current test session. Many of them (test-indexer, test-web-e2e.local) are whitelisted and not recorded. You might notice a few requests to ArXiv like so:

  • GET http://export.arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai...
  • GET http://export.arxiv.org/pdf/1806.04664
  • GET http://export.arxiv.org/e-print/1806.04664

These are live interactions that are recorded, you can find them in tests/e2e/scenarios/arxiv_in_hp/ArxivService/. If you need to re-record an interaction, simply remove the file you want to overwrite or rename it in such a way that it doesn’t have a yaml extension.

Tip

Since the responses from ArXiv come compressed, in order to preserve the original test data, this is also the way they are stored. If you need to look inside, you can copy the body from the yaml, and assuming it’s pasted in another file called gzip.txt run:

cat gzip.txt | base64 -di | gzip -d > plain.txt

Similarily to compress it back:

cat plain.txt | gzip | base64 > gzip.txt

Querying the Holdingpen

Now that our interactions are recorded we can go ahead and finish our test, by making assertions on the holdingpen records. We can also remove the should_record=True option from the @with_mitmproxy decorator, as our interactions are now recorded.

To make assertions we can use the inspire_client and more precisely its holdingpen module:

@with_mitmproxy
def test_arxiv_in_hp(inspire_client, mitm_client):
    inspire_client.e2e.schedule_crawl(
        spider='arXiv_single',
        workflow='article',
        url='http://export.arxiv.org/oai2',
        identifier='oai:arXiv.org:1806.04664',
    )

    time.sleep(60)

    holdingpen_entries = inspire_client.holdingpen.get_list_entries()

    assert len(holdingpen_entries) == 1

    holdingpen_entry = holdingpen_entries[0]

    assert holdingpen_entry.status == 'HALTED'
    assert holdingpen_entry.core is None
    assert holdingpen_entry.arxiv_eprint == '1806.04664'

This test needs to be refactored to not use a “simple” time.sleep, but actual polling, but already it should work.

Further Improvements

As mentioned before, we can introduce a fixture which will enable us to poll until harvest was finished, instead of having a simple time.sleep (snippet taken from tests/e2e/test_arxiv_harvest.py):

def wait_for(func, *args, **kwargs):
    max_time = kwargs.pop('max_time', 200)
    interval = kwargs.pop('interval', 2)

    decorator = backoff.on_exception(
        backoff.constant,
        AssertionError,
        interval=interval,
        max_time=max_time,
    )
    decorated = decorator(func)
    return decorated(*args, **kwargs)

We can then use the fixture in our test:

@with_mitmproxy
def test_arxiv_in_hp(inspire_client, mitm_client):
    inspire_client.e2e.schedule_crawl(
        spider='arXiv_single',
        workflow='article',
        url='http://export.arxiv.org/oai2',
        identifier='oai:arXiv.org:1806.04664',
    )

    def _in_holdinpen():
        holdingpen_entries = inspire_client.holdingpen.get_list_entries()
        assert len(holdingpen_entries) > 0
        assert holdingpen_entries[0].status == 'HALTED'
        return holdingpen_entries

    # Will poll every two seconds and timeout after 200 seconds
    holdingpen_entries = wait_for(_in_holdinpen)

    assert len(holdingpen_entries) == 1

    holdingpen_entry = holdingpen_entries[0]

    assert holdingpen_entry.core is None
    assert holdingpen_entry.arxiv_eprint == '1806.04664'

We can also use the mitmproxy client to make assertions on the interactions with external services that happened during our test:

@with_mitmproxy
def test_arxiv_in_hp(inspire_client, mitm_client):
    # ... ...
    mitm_client.assert_interaction_used('ArxivService', 'interaction_0', times=1)

Above will fail if the interaction scenarios/arxiv_in_hp/ArxivService/interaction_0.yaml has not been used exactly one time. You can leave off the times parameter if you want to assert that the interaction happened at least once, instead of specifying exactly the number of times. Names of interactions are not important so you can rename them if you like. Naming only matters if two interactions can match the same request: in such case the lexicographically first one is chosen for consistency.

Troubleshooting/Tips

Accessing web node in browser

If for any reason you need to access the web interface of INSPIRE, you can add an entry to your /etc/hosts file with the IP of the web container:

$ docker inspect inspirenext_test-web-e2e.local_1 | grep '"IPAddress"'

            "IPAddress": "",
                "IPAddress": "172.20.0.9",

$ sudo vim /etc/hosts

And add a line at the bottom:

172.20.0.9 test-web-e2e.local

Now you can visit http://test-web-e2e.local:5000 in your browser, provided the container is running.

Docker cheatsheet

In order to start the web container (don’t forget the .local at the end!):

docker-compose -f docker-compose.test.yml up test-web-e2e.local

For any other container, change the test-web-e2e.local to the suitable name; other containers don’t end in .local, this is needed only for inspire-next node as it has to be a domain name.

Similarily substitute up for stop or kill to bring it down, and rm to remove the container (e.g. so that the new updated image can be used).

To view the logs of a container:

docker-compose -f docker-compose.test.yml logs test-worker-e2e

In order to run a shell in an already running container (e.g. to investigate errors):

# E.g. for INSPIRE
docker-compose -f docker-compose.test.yml exec test-web-e2e.local bash

# For MITM-Proxy we use `ash`, as it runs on Alpine Linux base, which doesn't ship with `bash`
docker-compose -f docker-compose.test.yml exec mitm-proxy ash