inspirehep.modules.disambiguation package¶
Subpackages¶
Submodules¶
inspirehep.modules.disambiguation.api module¶
Disambiguation API.
-
inspirehep.modules.disambiguation.api.
save_curated_signatures_and_input_clusters
()[source]¶ Save curated signatures and input clusters to disk.
Saves two files to disk called (by default)
input_clusters.jsonl
andcurated_signatures.jsonl
. The former contains one line per each cluster initially present in INSPIRE, while the latter contains one line per each curated signature that will be used as ground truth byBEARD
.
-
inspirehep.modules.disambiguation.api.
save_publications
()[source]¶ Save publications to disk.
Saves a file to disk called (by default)
publications.jsonl
, which contains one line per record in INSPIRE with information that will be useful forBEARD
during training and prediction.
-
inspirehep.modules.disambiguation.api.
save_sampled_pairs
()[source]¶ Save sampled signature pairs to disk.
Save a file to disk called (by default)
sampled_pairs.jsonl
, which contains one line per each pair of signatures sampled from INSPIRE that will be used byBEARD
during training.
inspirehep.modules.disambiguation.config module¶
Disambiguation configuration.
-
inspirehep.modules.disambiguation.config.
DISAMBIGUATION_SAMPLED_PAIRS_SIZE
= 1200000¶ The number of signature pairs we use during training.
Since INSPIRE has ~3M curated signatures it would take too much time to train on all possible pairs, so we sample ~1M pairs in such a way that they are representative of the known clusters structure.
Note
It MUST be a multiple of 12 for the reason explained in
inspirehep.modules.disambiguation.core.ml.sampling
.
inspirehep.modules.disambiguation.ext module¶
Disambiguation extension.
inspirehep.modules.disambiguation.utils module¶
Disambiguation utils.
Module contents¶
Disambiguation module.