inspirehep.modules.records package¶
Subpackages¶
- inspirehep.modules.records.mappings package
- inspirehep.modules.records.serializers package
- Subpackages
- inspirehep.modules.records.serializers.fields package
- inspirehep.modules.records.serializers.schemas package
- Subpackages
- inspirehep.modules.records.serializers.schemas.json package
- Subpackages
- inspirehep.modules.records.serializers.schemas.json.authors package
- inspirehep.modules.records.serializers.schemas.json.literature package
- Subpackages
- inspirehep.modules.records.serializers.schemas.json.literature.common package
- Submodules
- inspirehep.modules.records.serializers.schemas.json.literature.common.accelerator_experiment module
- inspirehep.modules.records.serializers.schemas.json.literature.common.author module
- inspirehep.modules.records.serializers.schemas.json.literature.common.citation_item module
- inspirehep.modules.records.serializers.schemas.json.literature.common.collaboration module
- inspirehep.modules.records.serializers.schemas.json.literature.common.collaboration_with_suffix module
- inspirehep.modules.records.serializers.schemas.json.literature.common.conference_info_item module
- inspirehep.modules.records.serializers.schemas.json.literature.common.doi module
- inspirehep.modules.records.serializers.schemas.json.literature.common.external_system_identifier module
- inspirehep.modules.records.serializers.schemas.json.literature.common.isbn module
- inspirehep.modules.records.serializers.schemas.json.literature.common.publication_info_item module
- inspirehep.modules.records.serializers.schemas.json.literature.common.reference_item module
- inspirehep.modules.records.serializers.schemas.json.literature.common.supervisor module
- inspirehep.modules.records.serializers.schemas.json.literature.common.thesis_info module
- Module contents
- inspirehep.modules.records.serializers.schemas.json.literature.common package
- Module contents
- Subpackages
- Module contents
- Subpackages
- inspirehep.modules.records.serializers.schemas.latex package
- inspirehep.modules.records.serializers.schemas.json package
- Submodules
- inspirehep.modules.records.serializers.schemas.base module
- Module contents
- Subpackages
- inspirehep.modules.records.serializers.writers package
- Submodules
- inspirehep.modules.records.serializers.config module
- inspirehep.modules.records.serializers.fields_export module
- inspirehep.modules.records.serializers.json_literature module
- inspirehep.modules.records.serializers.latex module
- inspirehep.modules.records.serializers.marcxml module
- inspirehep.modules.records.serializers.pybtex_serializer_base module
- inspirehep.modules.records.serializers.response module
- Module contents
- Subpackages
Submodules¶
inspirehep.modules.records.api module¶
Inspire Records
-
class
inspirehep.modules.records.api.
ESRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.InspireRecord
Record class that fetches records from ElasticSearch.
-
classmethod
get_record
(object_uuid, with_deleted=False)[source]¶ Get record instance from ElasticSearch.
-
updated
¶ Get last updated timestamp.
-
classmethod
-
class
inspirehep.modules.records.api.
InspireRecord
(data, model=None)[source]¶ Bases:
invenio_records_files.api.Record
Record class that fetches records from DataBase.
-
add_document_or_figure
(metadata, stream=None, is_document=True, file_name=None, key=None)[source]¶ Add a document or figure to the record.
Parameters: - metadata (dict) – metadata of the document or figure, see the schemas for more details, will be validated.
- stream (file like object) – if passed, will extract the file contents from it.
- is_document (bool) – if the given information is for a document,
set to
`False`
for a figure. - file_name (str) – Name of the file, used as a basis of the key for the files store.
- key (str) – if passed, will use this as the key for the files store
and ignore
file_name
, use it to overwrite existing keys.
Returns: metadata of the added document or figure.
Return type: Raises: TypeError
– if notfile_name
norkey
are passed (one of them is required).
-
classmethod
create
(data, id_=None, **kwargs)[source]¶ Override the default
create
.To handle also the docmuments and figures retrieval.
Note
Might create an extra revision in the record if it had to download any documents or figures.
Keyword Arguments: - id (uuid) – an optional uuid to assign to the created record object.
- files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
- skip_files (bool) – if
True
it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of theRECORDS_SKIP_FILES
configuration variable.
Examples
>>> record = { ... '$schema': 'hep.json', ... } >>> record = InspireRecord.create(record) >>> record.commit()
-
classmethod
create_or_update
(data, **kwargs)[source]¶ Create or update a record.
It will check if there is any record registered with the same
control_number
andpid_type
. If it’sTrue
, it will update the current record, otherwise it will create a new one.Keyword Arguments: - files_src_records (List[InspireRecord]) – if passed, it will try to get the files for the documents and figures from the first record in the list that has it in it’s files iterator before downloading them, for example to merge existing records.
- skip_files (bool) – if
True
it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of theRECORDS_SKIP_FILES
configuration variable.
Examples
>>> record = { ... '$schema': 'hep.json', ... } >>> record = InspireRecord.create_or_update(record) >>> record.commit()
-
download_documents_and_figures
(only_new=False, src_records=())[source]¶ Gets all the documents and figures of the record, and downloads them to the files property.
If the record does not have a control number yet, this function will do nothing and it will be left to the caller the task of calling it again once the control number is set.
When iterating through the documents and figures, the following happens:
- if url field points to the files api:
and there’s no src_records: * and only_new is False: it will throw an error, as that
would be the case that the record was created from scratch with a document that was already downloaded from another record, but that record was not passed, so we can’t get the file.
- and only_new is True:
- if key exists in the current record files: it will do nothing, as the file is already there.
- if key does not exist in the current record files: An exception will be thrown, as the file can’t be retrieved.
and there’s a src_records: * and only_new is False:
- if key exists in the src_records files: it will download the file from the local path derived from the src_records files.
- if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
- and only_new is True:
if key exists in the current record files: it will do nothing, as the file is already there.
if key does not exist in the current record files: * if key exists in the src_records files: it will download
the file from the local path derived from the src_records files.
- if key does not exist in the src_records files: An exception will be thrown, as the file can’t be retrieved.
if url field does not point to the files api: it will try to download the new file.
Parameters: - only_new (bool) – If True, will not re-download any files if the document[‘key’] matches an existing downloaded file.
- src_records (List[InspireRecord]) – if passed, it will try to get the files from this record files iterator before downloading them, for example to merge existing records.
-
dumps
()[source]¶ Returns a dict ‘representation’ of the record.
- Note: this is not suitable to create a new record from it, as the
- representation will include some extra fields that should not be present in the record’s json, see the ‘to_dict’ method instead.
-
get_citing_records_query
¶
-
get_modified_references
()[source]¶ Return the ids of the references diff between the latest and the previous version.
The diff includes references added or deleted. Changes in a reference’s content won’t be detected.
Also, it detects if record was deleted/un-deleted compared to the previous version and, in such cases, returns the full list of references.
References not linked to any record will be ignored.
Note: record should be committed to DB in order to correctly get the previous version.
Returns: pids of references changed from the previous version. Return type: Set[Tuple[str, int]]
-
merge
(other)[source]¶ Redirect pidstore of current record to the other InspireRecord.
Parameters: other (InspireRecord) – The record that self(record) is going to be redirected.
-
update
(data, **kwargs)[source]¶ Override the default
update
.To handle also the docmuments and figures retrieval.
Keyword Arguments: - files_src_records (InspireRecord) – if passed, it will try to get the files for the documents and figures from this record’s files iterator before downloading them, for example to merge existing records.
- skip_files (bool) – if
True
it will skip the files retrieval described above. Note also that, if not passed, it will fall back to the value of theRECORDS_SKIP_FILES
configuration variable.
-
inspirehep.modules.records.checkers module¶
Records checkers.
-
inspirehep.modules.records.checkers.
add_linked_ids
(dois, arxiv_ids, linked_ids)[source]¶ Increase the amount of times a paper with a specific doi has been cited by using its corresponding arxiv eprint and viceversa
double_count
is used to count the times that a doi and an arxiv eprint appear in the same paper so that we don’t count them twice in the final result
-
inspirehep.modules.records.checkers.
calculate_score_of_reference
(counted_reference)[source]¶ Given a tuple of the number of times cited by a core record and a non core record, calculate a score associated with a reference.
The score is calculated giving five times more importance to core records
-
inspirehep.modules.records.checkers.
check_unlinked_references
()[source]¶ Return two lists with the unlinked references that have a doi or an arxiv id.
If the reference read has a doi or an arxiv id, it is stored in the data structure. Once all the data is read, it is ordered by most relevant to less relevant.
-
inspirehep.modules.records.checkers.
get_all_unlinked_references
()[source]¶ Return a list of dict, in which each dictionary corresponds to one reference object and the status of core or non core
inspirehep.modules.records.cli module¶
-
class
inspirehep.modules.records.cli.
MyThreadPool
(processes=None, initializer=None, initargs=())[source]¶ Bases:
multiprocessing.pool.ThreadPool
inspirehep.modules.records.errors module¶
inspirehep.modules.records.ext module¶
Records extension.
inspirehep.modules.records.facets module¶
-
inspirehep.modules.records.facets.
must_match_all_filter
(field)[source]¶ Bool filter containing a list of must matches.
Range filter for returning record only with 1 <= authors <= 10.
inspirehep.modules.records.json_ref_loader module¶
Resource-aware json reference loaders to be used with jsonref.
-
class
inspirehep.modules.records.json_ref_loader.
AbstractRecordLoader
(store=(), cache_results=True)[source]¶ Bases:
jsonref.JsonLoader
Base for resource-aware record loaders.
Resolves the refered resource by the given uri by first checking against local resources.
-
class
inspirehep.modules.records.json_ref_loader.
DatabaseJsonLoader
(store=(), cache_results=True)[source]¶ Bases:
inspirehep.modules.records.json_ref_loader.AbstractRecordLoader
-
class
inspirehep.modules.records.json_ref_loader.
ESJsonLoader
(store=(), cache_results=True)[source]¶ Bases:
inspirehep.modules.records.json_ref_loader.AbstractRecordLoader
Resolve resources by retrieving them from Elasticsearch.
-
inspirehep.modules.records.json_ref_loader.
SCHEMA_LOADER_CLS
¶ Used in invenio-jsonschemas to resolve relative $ref.
alias of
JsonLoader
-
inspirehep.modules.records.json_ref_loader.
load_resolved_schema
(name)[source]¶ Load a JSON schema with all references resolved.
Parameters: name (str) – name of the schema to load. Returns: the JSON schema with resolved references. Return type: dict Examples
>>> resolved_schema = load_resolved_schema('authors')
-
inspirehep.modules.records.json_ref_loader.
replace_refs
(obj, source='db')[source]¶ Replaces record refs in obj by bypassing HTTP requests.
Any reference URI that comes from the same server and references a resource will be resolved directly either from the database or from Elasticsearch.
Parameters: - obj – Dict-like object for which ‘$ref’ fields are recursively replaced.
- source –
- List of sources from which to resolve the references. It can be any of:
- ‘db’ - resolve from Database
- ‘es’ - resolve from Elasticsearch
- ‘http’ - force using HTTP
Returns: The same obj structure with the ‘$ref’ fields replaced with the object available at the given URI.
inspirehep.modules.records.permissions module¶
-
class
inspirehep.modules.records.permissions.
RecordPermission
(record, func, user)[source]¶ Bases:
invenio_access.permissions.Permission
Record permission.
- Read access given if collection not restricted.
- Update access given to admins and cataloguers.
- All other actions are denied for the moment.
-
read_actions
= ['read']¶
-
update_actions
= ['update']¶
-
inspirehep.modules.records.permissions.
get_user_collections
()[source]¶ Get user restricted collections.
-
inspirehep.modules.records.permissions.
has_admin_permission
(user, record)[source]¶ Check if user has admin access to record.
-
inspirehep.modules.records.permissions.
has_read_permission
(user, record)[source]¶ Check if user has read access to the record.
-
inspirehep.modules.records.permissions.
has_update_permission
(user, record)[source]¶ Check if user has update access to the record.
-
inspirehep.modules.records.permissions.
load_user_collections
(app, user)[source]¶ Load user restricted collections upon login.
Receiver for flask_login.user_logged_in
inspirehep.modules.records.receivers module¶
Records receivers.
-
inspirehep.modules.records.receivers.
assign_phonetic_block
(sender, record, *args, **kwargs)[source]¶ Assign a phonetic block to each signature of a Literature record.
Uses the NYSIIS algorithm to compute a phonetic block from each signature’s full name, skipping those that are not recognized as real names, but logging an error when that happens.
-
inspirehep.modules.records.receivers.
assign_uuid
(sender, record, *args, **kwargs)[source]¶ Assign a UUID to each signature of a Literature record.
-
inspirehep.modules.records.receivers.
enhance_before_index
(record)[source]¶ Run all the receivers that enhance the record for ES in the right order.
Note
populate_recid_from_ref
MUST come beforepopulate_bookautocomplete
because the latter puts a JSON reference in a completion _source, which would be expanded to an incorrect_source_recid
by the former.
-
inspirehep.modules.records.receivers.
enhance_record
(sender, record, *args, **kwargs)[source]¶ Enhance the record for ES
-
inspirehep.modules.records.receivers.
index_after_commit
(sender, changes)[source]¶ Index a record in ES after it was committed to the DB.
This cannot happen in an
after_record_commit
receiver from Invenio-Records because, despite the name, at that point we are not yet sure whether the record has been really committed to the DB.
inspirehep.modules.records.tasks module¶
Records tasks.
-
(task)
inspirehep.modules.records.tasks.
index_modified_citations_from_record
[source]¶ Index records from the record’s citations.
This tasks retries itself in 2 scenarios: - A new record is saved but it is not yet visible by this task bacause the transaction is not finished yet (RecordGetterError).
- When a record is updated, but new changes are not yet in DB, for the
same reason as above (StaleDataError).
Parameters: - pid_type (String) – pid type of the record
- pid_value (String) – pid value of the record
- db_version (Int) – the correct version of the record that we expect to index. This prevents loading stale data from the DB.
- Raise:
- MissingCitedRecordError in case cited records are not found
inspirehep.modules.records.utils module¶
Record related utils.
Returns the display name in format Firstnames Lastnames
-
inspirehep.modules.records.utils.
get_endpoint_from_record
(record)[source]¶ Return the endpoint corresponding to a record.
-
inspirehep.modules.records.utils.
get_linked_records_in_field
(record, field_path)[source]¶ Get all linked records in a given field.
Parameters: Returns: an iterator on the linked record.
Return type: Iterator[dict]
Warning
Currently, the order in which the linked records are yielded is different from the order in which they appear in the record.
Example
>>> record = {'references': [ ... {'record': {'$ref': 'https://labs.inspirehep.net/api/literature/1234'}}, ... {'record': {'$ref': 'https://labs.inspirehep.net/api/data/421'}}, ... ]} >>> get_linked_record_in_field(record, 'references.record') [...]
-
inspirehep.modules.records.utils.
get_pid_from_record_uri
(record_uri)[source]¶ Transform a URI to a record into a (pid_type, pid_value) pair.
-
inspirehep.modules.records.utils.
populate_abstract_source_suggest
(record)[source]¶ Populate the
abstract_source_suggest
field in Literature records.
-
inspirehep.modules.records.utils.
populate_affiliation_suggest
(record)[source]¶ Populate the
affiliation_suggest
field of Institution records.
Populate the
author_count
field of Literature records.
Populate the
author_suggest
field of Authors records.
Populate the
authors.full_name_normalized
field of Literature records.
Generate name variations for an Author record.
-
inspirehep.modules.records.utils.
populate_bookautocomplete
(record)[source]¶ Populate the
`bookautocomplete
field of Literature records.
-
inspirehep.modules.records.utils.
populate_citations_count
(record)[source]¶ Populate citations_count in ES from
-
inspirehep.modules.records.utils.
populate_earliest_date
(record)[source]¶ Populate the
earliest_date
field of Literature records.
-
inspirehep.modules.records.utils.
populate_experiment_suggest
(record)[source]¶ Populates experiment_suggest field of experiment records.
Populate the
facet_author_name
field of Literature records.
-
inspirehep.modules.records.utils.
populate_inspire_document_type
(record)[source]¶ Populate the
facet_inspire_doc_type
field of Literature records.
-
inspirehep.modules.records.utils.
populate_name_variations
(record)[source]¶ Generate name variations for each signature of a Literature record.
-
inspirehep.modules.records.utils.
populate_number_of_references
(record)[source]¶ Generate name variations for each signature of a Literature record.
-
inspirehep.modules.records.utils.
populate_recid_from_ref
(record)[source]¶ Extract recids from all JSON reference fields and add them to ES.
For every field that has as a value a JSON reference, adds a sibling after extracting the record identifier. Siblings are named by removing
record
occurrences and appending_recid
without doubling or prepending underscores to the original name.Example:
{'record': {'$ref': 'http://x/y/2}}
is transformed to:
{ 'recid': 2, 'record': {'$ref': 'http://x/y/2}, }
For every list of object references adds a new list with the corresponding recids, whose name is similarly computed.
Example:
{ 'records': [ {'$ref': 'http://x/y/1'}, {'$ref': 'http://x/y/2'}, ], }
is transformed to:
{ 'recids': [1, 2], 'records': [ {'$ref': 'http://x/y/1'}, {'$ref': 'http://x/y/2'}, ], }
inspirehep.modules.records.views module¶
Data model package.
-
class
inspirehep.modules.records.views.
Facets
(**kwargs)[source]¶ Bases:
invenio_rest.views.ContentNegotiatedMethodView
-
methods
= ['GET']¶
-
view_name
= '{0}_facets'¶
-
-
class
inspirehep.modules.records.views.
LiteratureCitationsResource
(**kwargs)[source]¶ Bases:
invenio_rest.views.ContentNegotiatedMethodView
-
methods
= ['GET']¶
-
view_name
= 'literature_citations'¶
-
-
inspirehep.modules.records.views.
facets_view
(*args, **kwargs)¶
-
inspirehep.modules.records.views.
literature_citations_view
(*args, **kwargs)¶
inspirehep.modules.records.wrappers module¶
-
class
inspirehep.modules.records.wrappers.
AuthorsRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for author records.
-
title
¶ Get preferred title.
-
-
class
inspirehep.modules.records.wrappers.
ConferencesRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for conference records.
-
title
¶ Get preferred title.
-
-
class
inspirehep.modules.records.wrappers.
ExperimentsRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for experiment records.
-
title
¶ Get preferred title.
-
-
class
inspirehep.modules.records.wrappers.
InstitutionsRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for institution records.
-
title
¶ Get preferred title.
-
-
class
inspirehep.modules.records.wrappers.
JobsRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for job records.
-
similar
¶
-
title
¶ Get preferred title.
-
-
class
inspirehep.modules.records.wrappers.
JournalsRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for journal records.
-
name_variants
¶ Get name variations.
-
publisher
¶ Get preferred title.
-
title
¶ Get preferred title.
-
urls
¶ Get urls.
-
-
class
inspirehep.modules.records.wrappers.
LiteratureRecord
(data, model=None)[source]¶ Bases:
inspirehep.modules.records.api.ESRecord
,inspirehep.modules.records.wrappers.AdminToolsMixin
Record class specialized for literature records.
-
conference_information
¶ Conference information.
Returns a list with information about conferences related to the record.
-
external_system_identifiers
¶ External system identification information.
Returns a list that contains information on first of each kind of external_system_idenitfiers
-
get_link_info_for_external_sys_identifiers
(extid, ext_sys_id_info)[source]¶ Urls and names for external system identifiers
Returns a dictionary with 2 key value pairs, the first of which is the name of the external_system_identifier and the second is a link to the record in that external_system_identifer
-
publication_information
¶ Publication information.
Returns a list with information about each publication note in the record.
-
title
¶ Get preferred title.
-
Module contents¶
Data model package.