inspirehep.modules.records package¶

Subpackages¶

inspirehep.modules.records.mappings package
- Subpackages
  - inspirehep.modules.records.mappings.v5 package
    - Module contents
- Module contents
inspirehep.modules.records.serializers package

Submodules¶

inspirehep.modules.records.api module¶

Inspire Records

class inspirehep.modules.records.api.ESRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.InspireRecord

Record class that fetches records from ElasticSearch.

classmethod get_record(object_uuid, with_deleted=False)[source]¶: Get record instance from ElasticSearch.

updated¶: Get last updated timestamp.

class inspirehep.modules.records.api.InspireRecord(data, model=None)[source]¶

Bases: invenio_records_files.api.Record

Record class that fetches records from DataBase.

add_document_or_figure(metadata, stream=None, is_document=True, file_name=None, key=None)[source]¶

Add a document or figure to the record.

Parameters:	metadata (dict) – metadata of the document or figure, see the schemas for more details, will be validated. stream (file like object) – if passed, will extract the file contents from it. is_document (bool) – if the given information is for a document, set to `False` for a figure. file_name (str) – Name of the file, used as a basis of the key for the files store. key (str) – if passed, will use this as the key for the files store and ignore `file_name`, use it to overwrite existing keys.
Returns:	metadata of the added document or figure.
Return type:	dict
Raises:	`TypeError` – if not `file_name` nor `key` are passed (one of them is required).

classmethod create(data, id_=None, **kwargs)[source]¶

Override the default create.

To handle also the docmuments and figures retrieval.

Note

Might create an extra revision in the record if it had to download any documents or figures.

Keyword Arguments:

id (uuid) – an optional uuid to assign to the created record object.
files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.

Examples

>>> record = {
...     '$schema': 'hep.json',
... }
>>> record = InspireRecord.create(record)
>>> record.commit()

classmethod create_or_update(data, **kwargs)[source]¶

Create or update a record.

It will check if there is any record registered with the same control_number and pid_type. If it’s True, it will update the current record, otherwise it will create a new one.

Keyword Arguments:

files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.

Examples

>>> record = {
...     '$schema': 'hep.json',
... }
>>> record = InspireRecord.create_or_update(record)
>>> record.commit()

delete()[source]¶: Mark as deleted all pidstores for a specific record.

download_documents_and_figures(only_new=False, src_records=())[source]¶

Gets all the documents and figures of the record, and downloads them to the files property.

If the record does not have a control number yet, this function will do nothing and it will be left to the caller the task of calling it again once the control number is set.

When iterating through the documents and figures, the following happens:

if url field points to the files api:
- and there’s no src_records: * and only_new is False: it will throw an error, as that
  
  would be the case that the record was created from scratch with a document that was already downloaded from another record, but that record was not passed, so we can’t get the file.
  - and only_new is True:
    
    if key exists in the current record files: it will do nothing, as the file is already there.
    
    if key does not exist in the current record files: An exception will be thrown, as the file can’t be retrieved.
- and there’s a src_records: * and only_new is False:
  if key exists in the src_records files: it will download the file from the local path derived from the src_records files.
  
  if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
  - and only_new is True:
    
    if key exists in the current record files: it will do nothing, as the file is already there.
    
    if key does not exist in the current record files: * if key exists in the src_records files: it will download
    
    the file from the local path derived from the src_records files.
    
    if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
if url field does not point to the files api: it will try to download the new file.

Parameters:	only_new (bool) – If True, will not re-download any files if the document[‘key’] matches an existing downloaded file. src_records (List[InspireRecord]) – if passed, it will try to get the files from this record files iterator before downloading them, for example to merge existing records.

dumps()[source]¶

Returns a dict ‘representation’ of the record.

Note: this is not suitable to create a new record from it, as the: representation will include some extra fields that should not be present in the record’s json, see the ‘to_dict’ method instead.

get_citations_count(show_duplicates=False)[source]¶: Returns citations count for this record.

get_citing_records_query¶

get_modified_references()[source]¶

Return the ids of the references diff between the latest and the previous version.

The diff includes references added or deleted. Changes in a reference’s content won’t be detected.

Also, it detects if record was deleted/un-deleted compared to the previous version and, in such cases, returns the full list of references.

References not linked to any record will be ignored.

Note: record should be committed to DB in order to correctly get the previous version.

Returns:	pids of references changed from the previous version.
Return type:	Set[Tuple[str, int]]

merge(other)[source]¶

Redirect pidstore of current record to the other InspireRecord.

Parameters:	other (InspireRecord) – The record that self(record) is going to be redirected.

static mint(id_, data)[source]¶: Mint the record.

to_dict()[source]¶: Gets a deep copy of the record’s json.

update(data, **kwargs)[source]¶

Override the default update.

To handle also the docmuments and figures retrieval.

Keyword Arguments:

files_src_records (InspireRecord) – if passed, it will try to get the files for the documents and figures from this record’s files iterator before downloading them, for example to merge existing records.
skip_files (bool) – if True it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of the RECORDS_SKIP_FILES configuration variable.

validate()[source]¶: Validate the record, also ensuring format compliance.

class inspirehep.modules.records.api.referenced_records(*args, **kwargs)[source]¶

Bases: sqlalchemy.sql.functions.GenericFunction

identifier = 'referenced_records'¶

name = 'referenced_records'¶

type = ARRAY(Text())¶

inspirehep.modules.records.checkers module¶

Records checkers.

inspirehep.modules.records.checkers.add_linked_ids(dois, arxiv_ids, linked_ids)[source]¶

Increase the amount of times a paper with a specific doi has been cited by using its corresponding arxiv eprint and viceversa

double_count is used to count the times that a doi and an arxiv eprint appear in the same paper so that we don’t count them twice in the final result

inspirehep.modules.records.checkers.calculate_score_of_reference(counted_reference)[source]¶

Given a tuple of the number of times cited by a core record and a non core record, calculate a score associated with a reference.

The score is calculated giving five times more importance to core records

inspirehep.modules.records.checkers.check_unlinked_references()[source]¶

Return two lists with the unlinked references that have a doi or an arxiv id.

If the reference read has a doi or an arxiv id, it is stored in the data structure. Once all the data is read, it is ordered by most relevant to less relevant.

inspirehep.modules.records.checkers.get_all_unlinked_references()[source]¶: Return a list of dict, in which each dictionary corresponds to one reference object and the status of core or non core

inspirehep.modules.records.checkers.increase_cited_count(result, identifier, core)[source]¶: Increases the number of times a reference with the same identifier has appeared

inspirehep.modules.records.checkers.order_dictionary_into_list(result_dict)[source]¶: Return result_dict as an ordered list of tuples

inspirehep.modules.records.cli module¶

class inspirehep.modules.records.cli.MyThreadPool(processes=None, initializer=None, initargs=())[source]¶

Bases: multiprocessing.pool.ThreadPool

imap_unordered(func, iterable, second_argument, chunksize=1)[source]¶: Like imap() method but ordering of results is arbitrary

inspirehep.modules.records.cli.get_query_records_to_index(pid_types)[source]¶

Return a query for retrieving all non deleted records by pid_type

Parameters:	pid_types (List[str]) – a list of pid types
Returns:	SQLAlchemy query for non deleted record with pid type in pid_types

inspirehep.modules.records.cli.next_batch(iterator, batch_size)[source]¶

Get first batch_size elements from the iterable, or remaining if less.

Parameters:	iterator – the iterator for the iterable batch_size – size of the requested batch
Returns:	batch (list)

inspirehep.modules.records.errors module¶

exception inspirehep.modules.records.errors.MissingCitedRecordError[source]¶: Bases: invenio_records.errors.RecordsError

exception inspirehep.modules.records.errors.MissingInspireRecordError[source]¶: Bases: invenio_records.errors.RecordsError

inspirehep.modules.records.ext module¶

Records extension.

class inspirehep.modules.records.ext.InspireRecords(app=None)[source]¶

Bases: object

init_app(app)[source]¶

inspirehep.modules.records.json_ref_loader module¶

Resource-aware json reference loaders to be used with jsonref.

class inspirehep.modules.records.json_ref_loader.AbstractRecordLoader(store=(), cache_results=True)[source]¶

Bases: jsonref.JsonLoader

Base for resource-aware record loaders.

Resolves the refered resource by the given uri by first checking against local resources.

get_record(pid_type, recid)[source]¶

get_remote_json(uri, **kwargs)[source]¶

class inspirehep.modules.records.json_ref_loader.DatabaseJsonLoader(store=(), cache_results=True)[source]¶

Bases: inspirehep.modules.records.json_ref_loader.AbstractRecordLoader

get_record(pid_type, recid)[source]¶

class inspirehep.modules.records.json_ref_loader.ESJsonLoader(store=(), cache_results=True)[source]¶

Bases: inspirehep.modules.records.json_ref_loader.AbstractRecordLoader

Resolve resources by retrieving them from Elasticsearch.

get_record(pid_type, recid)[source]¶

inspirehep.modules.records.json_ref_loader.SCHEMA_LOADER_CLS¶

Used in invenio-jsonschemas to resolve relative $ref.

alias of JsonLoader

inspirehep.modules.records.json_ref_loader.load_resolved_schema(name)[source]¶

Load a JSON schema with all references resolved.

Parameters:	name (str) – name of the schema to load.
Returns:	the JSON schema with resolved references.
Return type:	dict

Examples

>>> resolved_schema = load_resolved_schema('authors')

inspirehep.modules.records.json_ref_loader.replace_refs(obj, source='db')[source]¶

Replaces record refs in obj by bypassing HTTP requests.

Any reference URI that comes from the same server and references a resource will be resolved directly either from the database or from Elasticsearch.

Parameters:	obj – Dict-like object for which ‘$ref’ fields are recursively replaced. source – List of sources from which to resolve the references. It can be any of: ‘db’ - resolve from Database ‘es’ - resolve from Elasticsearch ‘http’ - force using HTTP
Returns:	The same obj structure with the ‘$ref’ fields replaced with the object available at the given URI.

inspirehep.modules.records.permissions module¶

class inspirehep.modules.records.permissions.RecordPermission(record, func, user)[source]¶

Bases: invenio_access.permissions.Permission

Record permission.

Read access given if collection not restricted.
Update access given to admins and cataloguers.
All other actions are denied for the moment.

can()[source]¶: Determine access.

classmethod create(record, action, user=None)[source]¶: Create a record permission.

read_actions = ['read']¶

update_actions = ['update']¶

inspirehep.modules.records.permissions.deny(user, record)[source]¶: Deny access.

inspirehep.modules.records.permissions.get_user_collections()[source]¶: Get user restricted collections.

inspirehep.modules.records.permissions.has_admin_permission(user, record)[source]¶: Check if user has admin access to record.

inspirehep.modules.records.permissions.has_read_permission(user, record)[source]¶: Check if user has read access to the record.

inspirehep.modules.records.permissions.has_update_permission(user, record)[source]¶: Check if user has update access to the record.

inspirehep.modules.records.permissions.load_restricted_collections()[source]¶

inspirehep.modules.records.permissions.load_user_collections(app, user)[source]¶

Load user restricted collections upon login.

Receiver for flask_login.user_logged_in

inspirehep.modules.records.permissions.record_read_permission_factory(record=None)[source]¶: Record permission factory.

inspirehep.modules.records.permissions.record_update_permission_factory(record=None)[source]¶: Record permission factory.

inspirehep.modules.records.receivers module¶

Records receivers.

inspirehep.modules.records.receivers.assign_phonetic_block(sender, record, *args, **kwargs)[source]¶

Assign a phonetic block to each signature of a Literature record.

Uses the NYSIIS algorithm to compute a phonetic block from each signature’s full name, skipping those that are not recognized as real names, but logging an error when that happens.

inspirehep.modules.records.receivers.assign_uuid(sender, record, *args, **kwargs)[source]¶: Assign a UUID to each signature of a Literature record.

inspirehep.modules.records.receivers.enhance_before_index(record)[source]¶: Run all the receivers that enhance the record for ES in the right order.

Note

populate_recid_from_ref MUST come before populate_bookautocomplete because the latter puts a JSON reference in a completion _source, which would be expanded to an incorrect _source_recid by the former.

inspirehep.modules.records.receivers.enhance_record(sender, record, *args, **kwargs)[source]¶: Enhance the record for ES

inspirehep.modules.records.receivers.index_after_commit(sender, changes)[source]¶

Index a record in ES after it was committed to the DB.

This cannot happen in an after_record_commit receiver from Invenio-Records because, despite the name, at that point we are not yet sure whether the record has been really committed to the DB.

inspirehep.modules.records.receivers.push_to_orcid[source]¶: If needed, queue the push of the new changes to ORCID.

inspirehep.modules.records.tasks module¶

Records tasks.

(task)inspirehep.modules.records.tasks.batch_reindex[source]¶: Task for bulk reindexing records.

inspirehep.modules.records.tasks.get_merged_records()[source]¶

inspirehep.modules.records.tasks.get_records_to_update(old_ref)[source]¶

(task)inspirehep.modules.records.tasks.index_modified_citations_from_record[source]¶

Index records from the record’s citations.

This tasks retries itself in 2 scenarios: - A new record is saved but it is not yet visible by this task bacause the transaction is not finished yet (RecordGetterError).

When a record is updated, but new changes are not yet in DB, for the

same reason as above (StaleDataError).

Parameters:	pid_type (String) – pid type of the record pid_value (String) – pid value of the record db_version (Int) – the correct version of the record that we expect to index. This prevents loading stale data from the DB.

Raise:: MissingCitedRecordError in case cited records are not found

(task)inspirehep.modules.records.tasks.merge_merged_records[source]¶: Merge all records that were marked as merged.

inspirehep.modules.records.tasks.update_links(record, old_ref, new_ref)[source]¶

(task)inspirehep.modules.records.tasks.update_refs[source]¶

Update references in the entire database.

Replaces all occurrences of old_ref with new_ref, provided that they happen at one of the paths listed in INSPIRE_REF_UPDATER_WHITELISTS.

inspirehep.modules.records.utils module¶

Record related utils.

inspirehep.modules.records.utils.get_author_display_name(name)[source]¶: Returns the display name in format Firstnames Lastnames

inspirehep.modules.records.utils.get_author_with_record_facet_author_name(author)[source]¶

inspirehep.modules.records.utils.get_endpoint_from_record(record)[source]¶: Return the endpoint corresponding to a record.

inspirehep.modules.records.utils.get_linked_records_in_field(record, field_path)[source]¶

Get all linked records in a given field.

Parameters:	record (dict) – the record containing the links field_path (string) – a dotted field path specification understandable by `get_value`, containing a json reference to another record.
Returns:	an iterator on the linked record.
Return type:	Iterator[dict]

Warning

Currently, the order in which the linked records are yielded is different from the order in which they appear in the record.

Example

>>> record = {'references': [
...     {'record': {'$ref': 'https://labs.inspirehep.net/api/literature/1234'}},
...     {'record': {'$ref': 'https://labs.inspirehep.net/api/data/421'}},
... ]}
>>> get_linked_record_in_field(record, 'references.record')
[...]

inspirehep.modules.records.utils.get_pid_from_record_uri(record_uri)[source]¶: Transform a URI to a record into a (pid_type, pid_value) pair.

inspirehep.modules.records.utils.is_author(record)[source]¶

inspirehep.modules.records.utils.is_book(record)[source]¶

inspirehep.modules.records.utils.is_data(record)[source]¶

inspirehep.modules.records.utils.is_experiment(record)[source]¶

inspirehep.modules.records.utils.is_hep(record)[source]¶

inspirehep.modules.records.utils.is_institution(record)[source]¶

inspirehep.modules.records.utils.is_journal(record)[source]¶

inspirehep.modules.records.utils.populate_abstract_source_suggest(record)[source]¶: Populate the abstract_source_suggest field in Literature records.

inspirehep.modules.records.utils.populate_affiliation_suggest(record)[source]¶: Populate the affiliation_suggest field of Institution records.

inspirehep.modules.records.utils.populate_author_count(record)[source]¶: Populate the author_count field of Literature records.

inspirehep.modules.records.utils.populate_author_suggest(record, *args, **kwargs)[source]¶: Populate the author_suggest field of Authors records.

inspirehep.modules.records.utils.populate_authors_full_name_unicode_normalized(record)[source]¶: Populate the authors.full_name_normalized field of Literature records.

inspirehep.modules.records.utils.populate_authors_name_variations(record)[source]¶: Generate name variations for an Author record.

inspirehep.modules.records.utils.populate_bookautocomplete(record)[source]¶: Populate the `bookautocomplete field of Literature records.

inspirehep.modules.records.utils.populate_citations_count(record)[source]¶: Populate citations_count in ES from

inspirehep.modules.records.utils.populate_earliest_date(record)[source]¶: Populate the earliest_date field of Literature records.

inspirehep.modules.records.utils.populate_experiment_suggest(record)[source]¶: Populates experiment_suggest field of experiment records.

inspirehep.modules.records.utils.populate_facet_author_name(record)[source]¶: Populate the facet_author_name field of Literature records.

inspirehep.modules.records.utils.populate_inspire_document_type(record)[source]¶: Populate the facet_inspire_doc_type field of Literature records.

inspirehep.modules.records.utils.populate_name_variations(record)[source]¶: Generate name variations for each signature of a Literature record.

inspirehep.modules.records.utils.populate_number_of_references(record)[source]¶: Generate name variations for each signature of a Literature record.

inspirehep.modules.records.utils.populate_recid_from_ref(record)[source]¶

Extract recids from all JSON reference fields and add them to ES.

For every field that has as a value a JSON reference, adds a sibling after extracting the record identifier. Siblings are named by removing record occurrences and appending _recid without doubling or prepending underscores to the original name.

Example:

{'record': {'$ref': 'http://x/y/2}}

is transformed to:

{
    'recid': 2,
    'record': {'$ref': 'http://x/y/2},
}

For every list of object references adds a new list with the corresponding recids, whose name is similarly computed.

Example:

{
    'records': [
        {'$ref': 'http://x/y/1'},
        {'$ref': 'http://x/y/2'},
    ],
}

is transformed to:

{
    'recids': [1, 2],
    'records': [
        {'$ref': 'http://x/y/1'},
        {'$ref': 'http://x/y/2'},
    ],
}

inspirehep.modules.records.utils.populate_title_suggest(record)[source]¶: Populate the title_suggest field of Journals records.

inspirehep.modules.records.views module¶

Data model package.

class inspirehep.modules.records.views.Facets(**kwargs)[source]¶

Bases: invenio_rest.views.ContentNegotiatedMethodView

get(*args, **kwargs)[source]¶

methods = ['GET']¶

view_name = '{0}_facets'¶

class inspirehep.modules.records.views.LiteratureCitationsResource(**kwargs)[source]¶

Bases: invenio_rest.views.ContentNegotiatedMethodView

get(pid_value, *args, **kwargs)[source]¶

methods = ['GET']¶

view_name = 'literature_citations'¶

inspirehep.modules.records.views.facets_view(*args, **kwargs)¶

inspirehep.modules.records.views.literature_citations_view(*args, **kwargs)¶

inspirehep.modules.records.wrappers module¶

class inspirehep.modules.records.wrappers.AdminToolsMixin[source]¶

Bases: object

admin_tools¶

class inspirehep.modules.records.wrappers.AuthorsRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for author records.

title¶: Get preferred title.

class inspirehep.modules.records.wrappers.ConferencesRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for conference records.

title¶: Get preferred title.

class inspirehep.modules.records.wrappers.ExperimentsRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for experiment records.

title¶: Get preferred title.

class inspirehep.modules.records.wrappers.InstitutionsRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for institution records.

title¶: Get preferred title.

class inspirehep.modules.records.wrappers.JobsRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for job records.

similar¶

title¶: Get preferred title.

class inspirehep.modules.records.wrappers.JournalsRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for journal records.

name_variants¶: Get name variations.

publisher¶: Get preferred title.

title¶: Get preferred title.

urls¶: Get urls.

class inspirehep.modules.records.wrappers.LiteratureRecord(data, model=None)[source]¶

Bases: inspirehep.modules.records.api.ESRecord, inspirehep.modules.records.wrappers.AdminToolsMixin

Record class specialized for literature records.

conference_information¶

Conference information.

Returns a list with information about conferences related to the record.

external_system_identifiers¶

External system identification information.

Returns a list that contains information on first of each kind of external_system_idenitfiers

get_link_info_for_external_sys_identifiers(extid, ext_sys_id_info)[source]¶

Urls and names for external system identifiers

Returns a dictionary with 2 key value pairs, the first of which is the name of the external_system_identifier and the second is a link to the record in that external_system_identifer

publication_information¶

Publication information.

Returns a list with information about each publication note in the record.

title¶: Get preferred title.

Module contents¶

Data model package.

inspirehep.modules.records package¶

Subpackages¶

Submodules¶

inspirehep.modules.records.api module¶

inspirehep.modules.records.checkers module¶

inspirehep.modules.records.cli module¶

inspirehep.modules.records.errors module¶

inspirehep.modules.records.ext module¶

inspirehep.modules.records.facets module¶

inspirehep.modules.records.json_ref_loader module¶

inspirehep.modules.records.permissions module¶

inspirehep.modules.records.receivers module¶

inspirehep.modules.records.tasks module¶

inspirehep.modules.records.utils module¶

inspirehep.modules.records.views module¶

inspirehep.modules.records.wrappers module¶

Module contents¶

INSPIRE-HEP

Navigation

Related Topics