inspirehep.modules.workflows.utils package

Module contents

Workflows utils.

inspirehep.modules.workflows.utils.convert(xml, xslt_filename)[source]

Convert XML using given XSLT stylesheet.

inspirehep.modules.workflows.utils.copy_file_to_workflow(*args, **kwargs)[source]
inspirehep.modules.workflows.utils.do_not_repeat(step_id)[source]

Decorator used to skip workflow steps when a workflow is re-run.

Will store the result of running the workflow step in source_data.persistent_data after running the first time, and skip the step on the following runs, also applying previously recorded ‘changes’ to extra_data.

The decorated function has to conform to the following signature:

def decorated_step(obj: WorkflowObject, eng: WorkflowEngine) -> Dict[str, Any]: ...

Where obj and eng are usual arguments following the protocol of all workflow steps. The returned value of the decorated_step will be used as a patch to be applied on the workflow object’s source data (which ‘replays’ changes made by the workflow step).

Parameters:step_id (str) – name of the workflow step, to be used as key in persistent_data
Returns:the decorator
Return type:callable
inspirehep.modules.workflows.utils.download_file_to_workflow(*args, **kwargs)[source]

Download a file to a specified workflow.

The workflow.files property is actually a method, which returns a WorkflowFilesIterator. This class inherits a custom __setitem__ method from its parent, FilesIterator, which ends up calling save on an invenio_files_rest.storage.pyfs.PyFSFileStorage instance through ObjectVersion and FileObject. This method consumes the stream passed to it and saves in its place a FileObject with the details of the downloaded file.

Consuming the stream might raise a ProtocolError because the server might terminate the connection before sending any data. In this case we retry 5 times with exponential backoff before giving up.

inspirehep.modules.workflows.utils.get_document_in_workflow(*args, **kwds)[source]

Context manager giving the path to the document attached to a workflow object.

Arg:
obj: workflow object
Returns:The path to a local copy of the document. If no documents are present, it retuns None. If several documents are present, it prioritizes the fulltext. If several documents with the same priority are present, it takes the first one and logs an error.
Return type:Optional[str]
inspirehep.modules.workflows.utils.get_resolve_edit_article_callback_url()[source]

Resolve edit_article workflow letting it continue.

Note

It’s using inspire_workflows.callback_resolve_edit_article route.

inspirehep.modules.workflows.utils.get_resolve_merge_conflicts_callback_url()[source]

Resolve validation callback.

Returns the callback url for resolving the merge conflicts.

Note

It’s using inspire_workflows.callback_resolve_merge_conflicts route.

inspirehep.modules.workflows.utils.get_resolve_validation_callback_url()[source]

Resolve validation callback.

Returns the callback url for resolving the validation errors.

Note

It’s using inspire_workflows.callback_resolve_validation route.

inspirehep.modules.workflows.utils.get_source_for_root(source)[source]

Source for the root workflow object.

Parameters:source (str) – the record source.
Returns:the source for the root workflow object.
Return type:(str)

Note

For the time being any workflow with acquisition_source.source different than arxiv and submitter will be stored as publisher.

inspirehep.modules.workflows.utils.get_validation_errors(data, schema)[source]

Creates a validation_errors dictionary.

Parameters:
  • data (dict) – the object to validate.
  • schema (str) – the name of the schema.
Returns:

validation_errors formatted dict.

Return type:

dict

inspirehep.modules.workflows.utils.ignore_timeout_error(return_value=None)[source]

Ignore the TimeoutError, returning return_value when it happens.

Quick fix for refextract and plotextract tasks only. It shouldn’t be used for others!

inspirehep.modules.workflows.utils.insert_wf_record_source(json, record_uuid, source)[source]

Stores a record in the WorkflowRecordSource table in the db.

Parameters:
  • json (dict) – the record’s content to store
  • record_uuid (uuid) – the record’s uuid
  • source (string) – the source of the record
inspirehep.modules.workflows.utils.json_api_request(*args, **kwargs)[source]

Make JSON API request and return JSON response.

inspirehep.modules.workflows.utils.log_workflows_action(action, relevance_prediction, object_id, user_id, source, user_action='')[source]

Log the action taken by user compared to a prediction.

inspirehep.modules.workflows.utils.read_all_wf_record_sources(record_uuid)[source]

Retrieve all WorkflowRecordSource for a given record id.

Parameters:record_uuid (uuid) – the uuid of the record
Returns:the WorkflowRecordSource``s related to ``record_uuid
Return type:(list)
inspirehep.modules.workflows.utils.read_wf_record_source(record_uuid, source)[source]

Retrieve a record from the WorkflowRecordSource table.

Parameters:
  • record_uuid (uuid) – the uuid of the record
  • source (string) – the acquisition source value of the record
Returns:

the given record, if any or None

Return type:

(dict)

inspirehep.modules.workflows.utils.timeout_with_config(config_key)[source]

Decorator to set a configurable timeout on a function.

Parameters:config_key (str) – config key with a integer value representing the time in seconds after which the decorated function will abort, raising a TimeoutError. If the key is not present in the config, a KeyError is raised.

Note

This function is needed because it’s impossible to pass a value read from the config as an argument to a decorator, as it gets evaluated before the application context is set up.

inspirehep.modules.workflows.utils.with_debug_logging(func)[source]

Generate a debug log with info on what’s going to run.

It tries its best to use the logging facilities of the object passed or the application context before falling back to the python logging facility.