inspirehep.modules.records package

Subpackages

Submodules

inspirehep.modules.records.api module

Inspire Records

class inspirehep.modules.records.api.ESRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.InspireRecord

Record class that fetches records from ElasticSearch.

classmethod get_record(object_uuid, with_deleted=False)[source]

Get record instance from ElasticSearch.

updated

Get last updated timestamp.

class inspirehep.modules.records.api.InspireRecord(data, model=None)[source]

Bases: invenio_records_files.api.Record

Record class that fetches records from DataBase.

add_document_or_figure(metadata, stream=None, is_document=True, file_name=None, key=None)[source]

Add a document or figure to the record.

Parameters:
  • metadata (dict) – metadata of the document or figure, see the schemas for more details, will be validated.
  • stream (file like object) – if passed, will extract the file contents from it.
  • is_document (bool) – if the given information is for a document, set to `False` for a figure.
  • file_name (str) – Name of the file, used as a basis of the key for the files store.
  • key (str) – if passed, will use this as the key for the files store and ignore file_name, use it to overwrite existing keys.
Returns:

metadata of the added document or figure.

Return type:

dict

Raises:

TypeError – if not file_name nor key are passed (one of them is required).

classmethod create(data, id_=None, **kwargs)[source]

Override the default create.

To handle also the docmuments and figures retrieval.

Note

Might create an extra revision in the record if it had to download any documents or figures.

Keyword Arguments:
 
  • id (uuid) – an optional uuid to assign to the created record object.
  • files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
  • skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.

Examples

>>> record = {
...     '$schema': 'hep.json',
... }
>>> record = InspireRecord.create(record)
>>> record.commit()
classmethod create_or_update(data, **kwargs)[source]

Create or update a record.

It will check if there is any record registered with the same control_number and pid_type. If it’s True, it will update the current record, otherwise it will create a new one.

Keyword Arguments:
 
  • files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
  • skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.

Examples

>>> record = {
...     '$schema': 'hep.json',
... }
>>> record = InspireRecord.create_or_update(record)
>>> record.commit()
delete()[source]

Mark as deleted all pidstores for a specific record.

download_documents_and_figures(only_new=False, src_records=())[source]

Gets all the documents and figures of the record, and downloads them to the files property.

If the record does not have a control number yet, this function will do nothing and it will be left to the caller the task of calling it again once the control number is set.

When iterating through the documents and figures, the following happens:

  • if url field points to the files api:
    • and there’s no src_records: * and only_new is False: it will throw an error, as that

      would be the case that the record was created from scratch with a document that was already downloaded from another record, but that record was not passed, so we can’t get the file.

      • and only_new is True:
        • if key exists in the current record files: it will do nothing, as the file is already there.
        • if key does not exist in the current record files: An exception will be thrown, as the file can’t be retrieved.
    • and there’s a src_records: * and only_new is False:

      • if key exists in the src_records files: it will download the file from the local path derived from the src_records files.
      • if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
      • and only_new is True:
        • if key exists in the current record files: it will do nothing, as the file is already there.

        • if key does not exist in the current record files: * if key exists in the src_records files: it will download

          the file from the local path derived from the src_records files.

          • if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
  • if url field does not point to the files api: it will try to download the new file.

Parameters:
  • only_new (bool) – If True, will not re-download any files if the document[‘key’] matches an existing downloaded file.
  • src_records (List[InspireRecord]) – if passed, it will try to get the files from this record files iterator before downloading them, for example to merge existing records.
dumps()[source]

Returns a dict ‘representation’ of the record.

Note: this is not suitable to create a new record from it, as the
representation will include some extra fields that should not be present in the record’s json, see the ‘to_dict’ method instead.
get_citations_count(show_duplicates=False)[source]

Returns citations count for this record.

get_citing_records_query
get_modified_references()[source]

Return the ids of the references diff between the latest and the previous version.

The diff includes references added or deleted. Changes in a reference’s content won’t be detected.

Also, it detects if record was deleted/un-deleted compared to the previous version and, in such cases, returns the full list of references.

References not linked to any record will be ignored.

Note: record should be committed to DB in order to correctly get the previous version.

Returns:pids of references changed from the previous version.
Return type:Set[Tuple[str, int]]
merge(other)[source]

Redirect pidstore of current record to the other InspireRecord.

Parameters:other (InspireRecord) – The record that self(record) is going to be redirected.
static mint(id_, data)[source]

Mint the record.

to_dict()[source]

Gets a deep copy of the record’s json.

update(data, **kwargs)[source]

Override the default update.

To handle also the docmuments and figures retrieval.

Keyword Arguments:
 
  • files_src_records (InspireRecord) – if passed, it will try to get the files for the documents and figures from this record’s files iterator before downloading them, for example to merge existing records.
  • skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.
validate()[source]

Validate the record, also ensuring format compliance.

class inspirehep.modules.records.api.referenced_records(*args, **kwargs)[source]

Bases: sqlalchemy.sql.functions.GenericFunction

identifier = 'referenced_records'
name = 'referenced_records'
type = ARRAY(Text())

inspirehep.modules.records.checkers module

Records checkers.

inspirehep.modules.records.checkers.add_linked_ids(dois, arxiv_ids, linked_ids)[source]

Increase the amount of times a paper with a specific doi has been cited by using its corresponding arxiv eprint and viceversa

double_count is used to count the times that a doi and an arxiv eprint appear in the same paper so that we don’t count them twice in the final result

inspirehep.modules.records.checkers.calculate_score_of_reference(counted_reference)[source]

Given a tuple of the number of times cited by a core record and a non core record, calculate a score associated with a reference.

The score is calculated giving five times more importance to core records

inspirehep.modules.records.checkers.check_unlinked_references()[source]

Return two lists with the unlinked references that have a doi or an arxiv id.

If the reference read has a doi or an arxiv id, it is stored in the data structure. Once all the data is read, it is ordered by most relevant to less relevant.

inspirehep.modules.records.checkers.get_all_unlinked_references()[source]

Return a list of dict, in which each dictionary corresponds to one reference object and the status of core or non core

inspirehep.modules.records.checkers.increase_cited_count(result, identifier, core)[source]

Increases the number of times a reference with the same identifier has appeared

inspirehep.modules.records.checkers.order_dictionary_into_list(result_dict)[source]

Return result_dict as an ordered list of tuples

inspirehep.modules.records.cli module

class inspirehep.modules.records.cli.MyThreadPool(processes=None, initializer=None, initargs=())[source]

Bases: multiprocessing.pool.ThreadPool

imap_unordered(func, iterable, second_argument, chunksize=1)[source]

Like imap() method but ordering of results is arbitrary

inspirehep.modules.records.cli.get_query_records_to_index(pid_types)[source]

Return a query for retrieving all non deleted records by pid_type

Parameters:pid_types (List[str]) – a list of pid types
Returns:SQLAlchemy query for non deleted record with pid type in pid_types
inspirehep.modules.records.cli.next_batch(iterator, batch_size)[source]

Get first batch_size elements from the iterable, or remaining if less.

Parameters:
  • iterator – the iterator for the iterable
  • batch_size – size of the requested batch
Returns:

batch (list)

inspirehep.modules.records.errors module

exception inspirehep.modules.records.errors.MissingCitedRecordError[source]

Bases: invenio_records.errors.RecordsError

exception inspirehep.modules.records.errors.MissingInspireRecordError[source]

Bases: invenio_records.errors.RecordsError

inspirehep.modules.records.ext module

Records extension.

class inspirehep.modules.records.ext.InspireRecords(app=None)[source]

Bases: object

init_app(app)[source]

inspirehep.modules.records.facets module

inspirehep.modules.records.facets.must_match_all_filter(field)[source]

Bool filter containing a list of must matches.

inspirehep.modules.records.facets.range_author_count_filter(field)[source]

Range filter for returning record only with 1 <= authors <= 10.

inspirehep.modules.records.json_ref_loader module

Resource-aware json reference loaders to be used with jsonref.

class inspirehep.modules.records.json_ref_loader.AbstractRecordLoader(store=(), cache_results=True)[source]

Bases: jsonref.JsonLoader

Base for resource-aware record loaders.

Resolves the refered resource by the given uri by first checking against local resources.

get_record(pid_type, recid)[source]
get_remote_json(uri, **kwargs)[source]
class inspirehep.modules.records.json_ref_loader.DatabaseJsonLoader(store=(), cache_results=True)[source]

Bases: inspirehep.modules.records.json_ref_loader.AbstractRecordLoader

get_record(pid_type, recid)[source]
class inspirehep.modules.records.json_ref_loader.ESJsonLoader(store=(), cache_results=True)[source]

Bases: inspirehep.modules.records.json_ref_loader.AbstractRecordLoader

Resolve resources by retrieving them from Elasticsearch.

get_record(pid_type, recid)[source]
inspirehep.modules.records.json_ref_loader.SCHEMA_LOADER_CLS

Used in invenio-jsonschemas to resolve relative $ref.

alias of JsonLoader

inspirehep.modules.records.json_ref_loader.load_resolved_schema(name)[source]

Load a JSON schema with all references resolved.

Parameters:name (str) – name of the schema to load.
Returns:the JSON schema with resolved references.
Return type:dict

Examples

>>> resolved_schema = load_resolved_schema('authors')
inspirehep.modules.records.json_ref_loader.replace_refs(obj, source='db')[source]

Replaces record refs in obj by bypassing HTTP requests.

Any reference URI that comes from the same server and references a resource will be resolved directly either from the database or from Elasticsearch.

Parameters:
  • obj – Dict-like object for which ‘$ref’ fields are recursively replaced.
  • source
    List of sources from which to resolve the references. It can be any of:
    • ‘db’ - resolve from Database
    • ‘es’ - resolve from Elasticsearch
    • ‘http’ - force using HTTP
Returns:

The same obj structure with the ‘$ref’ fields replaced with the object available at the given URI.

inspirehep.modules.records.permissions module

class inspirehep.modules.records.permissions.RecordPermission(record, func, user)[source]

Bases: invenio_access.permissions.Permission

Record permission.

  • Read access given if collection not restricted.
  • Update access given to admins and cataloguers.
  • All other actions are denied for the moment.
can()[source]

Determine access.

classmethod create(record, action, user=None)[source]

Create a record permission.

read_actions = ['read']
update_actions = ['update']
inspirehep.modules.records.permissions.deny(user, record)[source]

Deny access.

inspirehep.modules.records.permissions.get_user_collections()[source]

Get user restricted collections.

inspirehep.modules.records.permissions.has_admin_permission(user, record)[source]

Check if user has admin access to record.

inspirehep.modules.records.permissions.has_read_permission(user, record)[source]

Check if user has read access to the record.

inspirehep.modules.records.permissions.has_update_permission(user, record)[source]

Check if user has update access to the record.

inspirehep.modules.records.permissions.load_restricted_collections()[source]
inspirehep.modules.records.permissions.load_user_collections(app, user)[source]

Load user restricted collections upon login.

Receiver for flask_login.user_logged_in

inspirehep.modules.records.permissions.record_read_permission_factory(record=None)[source]

Record permission factory.

inspirehep.modules.records.permissions.record_update_permission_factory(record=None)[source]

Record permission factory.

inspirehep.modules.records.receivers module

Records receivers.

inspirehep.modules.records.receivers.assign_phonetic_block(sender, record, *args, **kwargs)[source]

Assign a phonetic block to each signature of a Literature record.

Uses the NYSIIS algorithm to compute a phonetic block from each signature’s full name, skipping those that are not recognized as real names, but logging an error when that happens.

inspirehep.modules.records.receivers.assign_uuid(sender, record, *args, **kwargs)[source]

Assign a UUID to each signature of a Literature record.

inspirehep.modules.records.receivers.enhance_before_index(record)[source]

Run all the receivers that enhance the record for ES in the right order.

Note

populate_recid_from_ref MUST come before populate_bookautocomplete because the latter puts a JSON reference in a completion _source, which would be expanded to an incorrect _source_recid by the former.

inspirehep.modules.records.receivers.enhance_record(sender, record, *args, **kwargs)[source]

Enhance the record for ES

inspirehep.modules.records.receivers.index_after_commit(sender, changes)[source]

Index a record in ES after it was committed to the DB.

This cannot happen in an after_record_commit receiver from Invenio-Records because, despite the name, at that point we are not yet sure whether the record has been really committed to the DB.

inspirehep.modules.records.receivers.push_to_orcid[source]

If needed, queue the push of the new changes to ORCID.

inspirehep.modules.records.tasks module

Records tasks.

(task)inspirehep.modules.records.tasks.batch_reindex[source]

Task for bulk reindexing records.

inspirehep.modules.records.tasks.get_merged_records()[source]
inspirehep.modules.records.tasks.get_records_to_update(old_ref)[source]
(task)inspirehep.modules.records.tasks.index_modified_citations_from_record[source]

Index records from the record’s citations.

This tasks retries itself in 2 scenarios: - A new record is saved but it is not yet visible by this task bacause the transaction is not finished yet (RecordGetterError).

  • When a record is updated, but new changes are not yet in DB, for the

same reason as above (StaleDataError).

Parameters:
  • pid_type (String) – pid type of the record
  • pid_value (String) – pid value of the record
  • db_version (Int) – the correct version of the record that we expect to index. This prevents loading stale data from the DB.
Raise:
MissingCitedRecordError in case cited records are not found
(task)inspirehep.modules.records.tasks.merge_merged_records[source]

Merge all records that were marked as merged.

(task)inspirehep.modules.records.tasks.update_refs[source]

Update references in the entire database.

Replaces all occurrences of old_ref with new_ref, provided that they happen at one of the paths listed in INSPIRE_REF_UPDATER_WHITELISTS.

inspirehep.modules.records.utils module

Record related utils.

inspirehep.modules.records.utils.get_author_display_name(name)[source]

Returns the display name in format Firstnames Lastnames

inspirehep.modules.records.utils.get_author_with_record_facet_author_name(author)[source]
inspirehep.modules.records.utils.get_endpoint_from_record(record)[source]

Return the endpoint corresponding to a record.

inspirehep.modules.records.utils.get_linked_records_in_field(record, field_path)[source]

Get all linked records in a given field.

Parameters:
  • record (dict) – the record containing the links
  • field_path (string) – a dotted field path specification understandable by get_value, containing a json reference to another record.
Returns:

an iterator on the linked record.

Return type:

Iterator[dict]

Warning

Currently, the order in which the linked records are yielded is different from the order in which they appear in the record.

Example

>>> record = {'references': [
...     {'record': {'$ref': 'https://labs.inspirehep.net/api/literature/1234'}},
...     {'record': {'$ref': 'https://labs.inspirehep.net/api/data/421'}},
... ]}
>>> get_linked_record_in_field(record, 'references.record')
[...]
inspirehep.modules.records.utils.get_pid_from_record_uri(record_uri)[source]

Transform a URI to a record into a (pid_type, pid_value) pair.

inspirehep.modules.records.utils.is_author(record)[source]
inspirehep.modules.records.utils.is_book(record)[source]
inspirehep.modules.records.utils.is_data(record)[source]
inspirehep.modules.records.utils.is_experiment(record)[source]
inspirehep.modules.records.utils.is_hep(record)[source]
inspirehep.modules.records.utils.is_institution(record)[source]
inspirehep.modules.records.utils.is_journal(record)[source]
inspirehep.modules.records.utils.populate_abstract_source_suggest(record)[source]

Populate the abstract_source_suggest field in Literature records.

inspirehep.modules.records.utils.populate_affiliation_suggest(record)[source]

Populate the affiliation_suggest field of Institution records.

inspirehep.modules.records.utils.populate_author_count(record)[source]

Populate the author_count field of Literature records.

inspirehep.modules.records.utils.populate_author_suggest(record, *args, **kwargs)[source]

Populate the author_suggest field of Authors records.

inspirehep.modules.records.utils.populate_authors_full_name_unicode_normalized(record)[source]

Populate the authors.full_name_normalized field of Literature records.

inspirehep.modules.records.utils.populate_authors_name_variations(record)[source]

Generate name variations for an Author record.

inspirehep.modules.records.utils.populate_bookautocomplete(record)[source]

Populate the `bookautocomplete field of Literature records.

inspirehep.modules.records.utils.populate_citations_count(record)[source]

Populate citations_count in ES from

inspirehep.modules.records.utils.populate_earliest_date(record)[source]

Populate the earliest_date field of Literature records.

inspirehep.modules.records.utils.populate_experiment_suggest(record)[source]

Populates experiment_suggest field of experiment records.

inspirehep.modules.records.utils.populate_facet_author_name(record)[source]

Populate the facet_author_name field of Literature records.

inspirehep.modules.records.utils.populate_inspire_document_type(record)[source]

Populate the facet_inspire_doc_type field of Literature records.

inspirehep.modules.records.utils.populate_name_variations(record)[source]

Generate name variations for each signature of a Literature record.

inspirehep.modules.records.utils.populate_number_of_references(record)[source]

Generate name variations for each signature of a Literature record.

inspirehep.modules.records.utils.populate_recid_from_ref(record)[source]

Extract recids from all JSON reference fields and add them to ES.

For every field that has as a value a JSON reference, adds a sibling after extracting the record identifier. Siblings are named by removing record occurrences and appending _recid without doubling or prepending underscores to the original name.

Example:

{'record': {'$ref': 'http://x/y/2}}

is transformed to:

{
    'recid': 2,
    'record': {'$ref': 'http://x/y/2},
}

For every list of object references adds a new list with the corresponding recids, whose name is similarly computed.

Example:

{
    'records': [
        {'$ref': 'http://x/y/1'},
        {'$ref': 'http://x/y/2'},
    ],
}

is transformed to:

{
    'recids': [1, 2],
    'records': [
        {'$ref': 'http://x/y/1'},
        {'$ref': 'http://x/y/2'},
    ],
}
inspirehep.modules.records.utils.populate_title_suggest(record)[source]

Populate the title_suggest field of Journals records.

inspirehep.modules.records.views module

Data model package.

class inspirehep.modules.records.views.Facets(**kwargs)[source]

Bases: invenio_rest.views.ContentNegotiatedMethodView

get(*args, **kwargs)[source]
methods = ['GET']
view_name = '{0}_facets'
class inspirehep.modules.records.views.LiteratureCitationsResource(**kwargs)[source]

Bases: invenio_rest.views.ContentNegotiatedMethodView

get(pid_value, *args, **kwargs)[source]
methods = ['GET']
view_name = 'literature_citations'
inspirehep.modules.records.views.facets_view(*args, **kwargs)
inspirehep.modules.records.views.literature_citations_view(*args, **kwargs)

inspirehep.modules.records.wrappers module

class inspirehep.modules.records.wrappers.AdminToolsMixin[source]

Bases: object

admin_tools
class inspirehep.modules.records.wrappers.AuthorsRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for author records.

title

Get preferred title.

class inspirehep.modules.records.wrappers.ConferencesRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for conference records.

title

Get preferred title.

class inspirehep.modules.records.wrappers.ExperimentsRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for experiment records.

title

Get preferred title.

class inspirehep.modules.records.wrappers.InstitutionsRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for institution records.

title

Get preferred title.

class inspirehep.modules.records.wrappers.JobsRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for job records.

similar
title

Get preferred title.

class inspirehep.modules.records.wrappers.JournalsRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for journal records.

name_variants

Get name variations.

publisher

Get preferred title.

title

Get preferred title.

urls

Get urls.

class inspirehep.modules.records.wrappers.LiteratureRecord(data, model=None)[source]

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for literature records.

conference_information

Conference information.

Returns a list with information about conferences related to the record.

external_system_identifiers

External system identification information.

Returns a list that contains information on first of each kind of external_system_idenitfiers

Urls and names for external system identifiers

Returns a dictionary with 2 key value pairs, the first of which is the name of the external_system_identifier and the second is a link to the record in that external_system_identifer

publication_information

Publication information.

Returns a list with information about each publication note in the record.

title

Get preferred title.

Module contents

Data model package.