inspirehep.modules.refextract package¶
Submodules¶
inspirehep.modules.refextract.config module¶
Refextract config.
-
inspirehep.modules.refextract.config.
REFERENCE_MATCHER_DATA_CONFIG
= {'doc_type': 'data', 'source': ['control_number'], 'algorithm': [{'queries': [{'path': 'reference.dois', 'type': 'exact', 'search_path': 'dois.value.raw'}]}], 'index': 'records-data'}¶ Configuration for matching data records. Please note that the index and doc_type are different for data records.
-
inspirehep.modules.refextract.config.
REFERENCE_MATCHER_DEFAULT_PUBLICATION_INFO_CONFIG
= {'doc_type': 'hep', 'collections': ['Literature'], 'source': ['control_number'], 'algorithm': [{'queries': [{'paths': ['reference.publication_info.journal_issue', 'reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.artid'], 'type': 'nested', 'search_paths': ['publication_info.journal_issue', 'publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.page_artid']}, {'paths': ['reference.publication_info.journal_issue', 'reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.page_start'], 'type': 'nested', 'search_paths': ['publication_info.journal_issue', 'publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.page_artid']}, {'paths': ['reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.artid'], 'type': 'nested', 'search_paths': ['publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.page_artid']}, {'paths': ['reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.page_start'], 'type': 'nested', 'search_paths': ['publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.page_artid']}]}], 'index': 'records-hep'}¶ Configuration for matching all HEP records using publication_info. These are separate from the unique queries since these can result in multiple matches (particularly in the case of errata).
-
inspirehep.modules.refextract.config.
REFERENCE_MATCHER_JHEP_AND_JCAP_PUBLICATION_INFO_CONFIG
= {'doc_type': 'hep', 'collections': ['Literature'], 'source': ['control_number'], 'algorithm': [{'queries': [{'paths': ['reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.year', 'reference.publication_info.artid'], 'type': 'nested', 'search_paths': ['publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.year', 'publication_info.page_artid']}, {'paths': ['reference.publication_info.journal_title', 'reference.publication_info.journal_volume', 'reference.publication_info.year', 'reference.publication_info.page_start'], 'type': 'nested', 'search_paths': ['publication_info.journal_title.raw', 'publication_info.journal_volume', 'publication_info.year', 'publication_info.page_artid']}]}], 'index': 'records-hep'}¶ Configuration for matching records JCAP and JHEP records using the publication_info, since we have to look at the year as well for accurate matching. These are separate from the unique queries since these can result in multiple matches (particularly in the case of errata).
-
inspirehep.modules.refextract.config.
REFERENCE_MATCHER_UNIQUE_IDENTIFIERS_CONFIG
= {'doc_type': 'hep', 'collections': ['Literature'], 'source': ['control_number'], 'algorithm': [{'queries': [{'path': 'reference.arxiv_eprint', 'type': 'exact', 'search_path': 'arxiv_eprints.value.raw'}, {'path': 'reference.dois', 'type': 'exact', 'search_path': 'dois.value.raw'}, {'path': 'reference.isbn', 'type': 'exact', 'search_path': 'isbns.value.raw'}, {'path': 'reference.texkey', 'type': 'exact', 'search_path': 'texkeys.raw'}, {'path': 'reference.report_numbers', 'type': 'exact', 'search_path': 'report_numbers.value.fuzzy'}]}], 'index': 'records-hep'}¶ Configuration for matching all HEP records (including JHEP and JCAP records) using unique identifiers.
inspirehep.modules.refextract.matcher module¶
-
inspirehep.modules.refextract.matcher.
match_reference
(reference, previous_matched_recid=None)[source]¶ Match a reference using inspire-matcher.
Parameters: Returns: the matched reference.
Return type:
inspirehep.modules.refextract.tasks module¶
Refextract tasks.
-
(task)
inspirehep.modules.refextract.tasks.
create_journal_kb_file
[source]¶ Populate refextracts’s journal KB from the database.
Uses two raw DB queries that use syntax specific to PostgreSQL to generate a file in the format that refextract expects, that is a list of lines like:
SOURCE---DESTINATION
which represents that
SOURCE
is translated toDESTINATION
when found.Note that refextract expects
SOURCE
to be normalized, which means removing all non alphanumeric characters, collapsing all contiguous whitespace to one space and uppercasing the resulting string.
inspirehep.modules.refextract.utils module¶
Refextract utils.
Module contents¶
RefExtract integration.