************************* E2E Test Writing Tutorial ************************* For the tutorial we will try to test the first part of the harvest. We will try to harvest arXiv and then assert that a holdingpen entry for the harvested record appears. Fixtures ++++++++ Let's create a test file `tests/e2e/test_arxiv_in_hp.py` in INSPIRE-Next. To run our tests we will need to import a few things and set up some fixtures: .. code-block:: python import os import pytest import time from inspirehep.testlib.api import InspireApiClient from inspirehep.testlib.api.mitm_client import MITMClient, with_mitmproxy @pytest.fixture def inspire_client(): # INSPIRE_API_URL is set by k8s when running the test in Jenkins inspire_url = os.environ.get('INSPIRE_API_URL', 'http://test-web-e2e.local:5000') return InspireApiClient(base_url=inspire_url) @pytest.fixture def mitm_client(): mitmproxy_url = os.environ.get('MITMPROXY_HOST', 'http://mitm-manager.local') return MITMClient(mitmproxy_url) ``InspireApiClient`` is used to interact with INSPIRE through the API. Using it we can for example trigger a harvest, or request holdingpen entries. ``MITMClient`` is a similar client for the proxy, with it we can swap scenarios, enable recording of interactions, or make assertions based on what happened during the test. ``with_mitmproxy`` is a helper decorator, that will automatically set up the scenario for you (scenario name will match the test name) and optionally, if you specify ``record=True``, enable recording for the duration of the test. We will also need the following fixture to set up all of the dummy fixtures and records in the test instance of INSPIRE. Most likely when writing a real test this fixture will already be present, as it is needed for virtually any test: .. code-block:: python @pytest.fixture(autouse=True, scope='function') def init_environment(inspire_client): inspire_client.e2e.init_db() inspire_client.e2e.init_es() inspire_client.e2e.init_fixtures() # refresh login session, giving a bit of time time.sleep(1) inspire_client.login_local() Interaction Recording +++++++++++++++++++++ Now that we have set up all of the necessary fixtures, we can attempt to start writing our test. We add a wait (for now, we will improve it later in the tutorial) at the end as to give time for INSPIRE to harvest, pull the pdf and the eprint, etc. Without this, the test would finish immediately after scheduling the crawl, which would deregister the scenario and disable recording. Later on, we will add actual polling to see if the articles were harvested. .. code-block:: python @with_mitmproxy(should_record=True) def test_arxiv_in_hp(inspire_client, mitm_client): inspire_client.e2e.schedule_crawl( spider='arXiv_single', workflow='article', url='http://export.arxiv.org/oai2', identifier='oai:arXiv.org:1806.04664', # Non-core, will halt ) time.sleep(60) # Let's wait for INSPIRE to harvest the records Let us now run this "test" and see what happens: .. code-block:: bash docker-compose -f docker-compose.test.yml run --rm e2e pytest tests/e2e/test_arxiv_in_hp.py Proxy Web UI ++++++++++++ After the test started running we can use the proxy's web interface to look at the requests that are happening during the test session. The proxy exposes its web interface on port 8081, so open your browser and navigate to ``http://127.0.0.1:8081``. There you will see initial requests to RT, ElasticSearch and so on, logging in to INSPIRE. These are followed by requests to the ``mitm-manager.local`` that set up the test scenario (``PUT /config``) and and recording (``POST /record``). After this all the requests (until disabling recording and/or switching the scenario) belong to the current test session. Many of them (``test-indexer``, ``test-web-e2e.local``) are whitelisted and not recorded. You might notice a few requests to ArXiv like so: * ``GET http://export.arxiv.org/oai2?verb=GetRecord&metadataPrefix=arXiv&identifier=oai...`` * ``GET http://export.arxiv.org/pdf/1806.04664`` * ``GET http://export.arxiv.org/e-print/1806.04664`` These are live interactions that are recorded, you can find them in ``tests/e2e/scenarios/arxiv_in_hp/ArxivService/``. If you need to re-record an interaction, simply remove the file you want to overwrite or rename it in such a way that it doesn't have a `yaml` extension. .. tip:: Since the responses from ArXiv come compressed, in order to preserve the original test data, this is also the way they are stored. If you need to look inside, you can copy the body from the yaml, and assuming it's pasted in another file called ``gzip.txt`` run: ``cat gzip.txt | base64 -di | gzip -d > plain.txt`` Similarily to compress it back: ``cat plain.txt | gzip | base64 > gzip.txt`` Querying the Holdingpen +++++++++++++++++++++++ Now that our interactions are recorded we can go ahead and finish our test, by making assertions on the holdingpen records. We can also remove the ``should_record=True`` option from the ``@with_mitmproxy`` decorator, as our interactions are now recorded. To make assertions we can use the ``inspire_client`` and more precisely its ``holdingpen`` module: .. code-block:: python @with_mitmproxy def test_arxiv_in_hp(inspire_client, mitm_client): inspire_client.e2e.schedule_crawl( spider='arXiv_single', workflow='article', url='http://export.arxiv.org/oai2', identifier='oai:arXiv.org:1806.04664', ) time.sleep(60) holdingpen_entries = inspire_client.holdingpen.get_list_entries() assert len(holdingpen_entries) == 1 holdingpen_entry = holdingpen_entries[0] assert holdingpen_entry.status == 'HALTED' assert holdingpen_entry.core is None assert holdingpen_entry.arxiv_eprint == '1806.04664' This test needs to be refactored to not use a "simple" ``time.sleep``, but actual polling, but already it should work. Further Improvements ++++++++++++++++++++ As mentioned before, we can introduce a fixture which will enable us to poll until harvest was finished, instead of having a simple ``time.sleep`` (snippet taken from ``tests/e2e/test_arxiv_harvest.py``): .. code-block:: python def wait_for(func, *args, **kwargs): max_time = kwargs.pop('max_time', 200) interval = kwargs.pop('interval', 2) decorator = backoff.on_exception( backoff.constant, AssertionError, interval=interval, max_time=max_time, ) decorated = decorator(func) return decorated(*args, **kwargs) We can then use the fixture in our test: .. code-block:: python @with_mitmproxy def test_arxiv_in_hp(inspire_client, mitm_client): inspire_client.e2e.schedule_crawl( spider='arXiv_single', workflow='article', url='http://export.arxiv.org/oai2', identifier='oai:arXiv.org:1806.04664', ) def _in_holdinpen(): holdingpen_entries = inspire_client.holdingpen.get_list_entries() assert len(holdingpen_entries) > 0 assert holdingpen_entries[0].status == 'HALTED' return holdingpen_entries # Will poll every two seconds and timeout after 200 seconds holdingpen_entries = wait_for(_in_holdinpen) assert len(holdingpen_entries) == 1 holdingpen_entry = holdingpen_entries[0] assert holdingpen_entry.core is None assert holdingpen_entry.arxiv_eprint == '1806.04664' We can also use the mitmproxy client to make assertions on the interactions with external services that happened during our test: .. code-block:: python @with_mitmproxy def test_arxiv_in_hp(inspire_client, mitm_client): # ... ... mitm_client.assert_interaction_used('ArxivService', 'interaction_0', times=1) Above will fail if the interaction ``scenarios/arxiv_in_hp/ArxivService/interaction_0.yaml`` has not been used exactly one time. You can leave off the ``times`` parameter if you want to assert that the interaction happened at least once, instead of specifying exactly the number of times. Names of interactions are not important so you can rename them if you like. Naming only matters if two interactions can match the same request: in such case the lexicographically first one is chosen for consistency. Troubleshooting/Tips ++++++++++++++++++++ Accessing web node in browser ----------------------------- If for any reason you need to access the web interface of INSPIRE, you can add an entry to your ``/etc/hosts`` file with the IP of the web container: .. code-block:: bash $ docker inspect inspirenext_test-web-e2e.local_1 | grep '"IPAddress"' "IPAddress": "", "IPAddress": "172.20.0.9", $ sudo vim /etc/hosts And add a line at the bottom: .. code-block:: text 172.20.0.9 test-web-e2e.local Now you can visit http://test-web-e2e.local:5000 in your browser, provided the container is running. Docker cheatsheet ----------------- In order to start the web container (don't forget the ``.local`` at the end!): .. code-block:: bash docker-compose -f docker-compose.test.yml up test-web-e2e.local For any other container, change the ``test-web-e2e.local`` to the suitable name; other containers don't end in ``.local``, this is needed only for inspire-next node as it has to be a domain name. Similarily substitute ``up`` for ``stop`` or ``kill`` to bring it down, and ``rm`` to remove the container (e.g. so that the new updated image can be used). To view the logs of a container: .. code-block:: bash docker-compose -f docker-compose.test.yml logs test-worker-e2e In order to run a shell in an already running container (e.g. to investigate errors): .. code-block:: bash # E.g. for INSPIRE docker-compose -f docker-compose.test.yml exec test-web-e2e.local bash # For MITM-Proxy we use `ash`, as it runs on Alpine Linux base, which doesn't ship with `bash` docker-compose -f docker-compose.test.yml exec mitm-proxy ash