inspirehep.modules.disambiguation package¶
Subpackages¶
Submodules¶
inspirehep.modules.disambiguation.api module¶
Disambiguation API.
-
inspirehep.modules.disambiguation.api.save_curated_signatures_and_input_clusters()[source]¶ Save curated signatures and input clusters to disk.
Saves two files to disk called (by default)
input_clusters.jsonlandcurated_signatures.jsonl. The former contains one line per each cluster initially present in INSPIRE, while the latter contains one line per each curated signature that will be used as ground truth byBEARD.
-
inspirehep.modules.disambiguation.api.save_publications()[source]¶ Save publications to disk.
Saves a file to disk called (by default)
publications.jsonl, which contains one line per record in INSPIRE with information that will be useful forBEARDduring training and prediction.
-
inspirehep.modules.disambiguation.api.save_sampled_pairs()[source]¶ Save sampled signature pairs to disk.
Save a file to disk called (by default)
sampled_pairs.jsonl, which contains one line per each pair of signatures sampled from INSPIRE that will be used byBEARDduring training.
inspirehep.modules.disambiguation.config module¶
Disambiguation configuration.
-
inspirehep.modules.disambiguation.config.DISAMBIGUATION_SAMPLED_PAIRS_SIZE= 1200000¶ The number of signature pairs we use during training.
Since INSPIRE has ~3M curated signatures it would take too much time to train on all possible pairs, so we sample ~1M pairs in such a way that they are representative of the known clusters structure.
Note
It MUST be a multiple of 12 for the reason explained in
inspirehep.modules.disambiguation.core.ml.sampling.
inspirehep.modules.disambiguation.ext module¶
Disambiguation extension.
inspirehep.modules.disambiguation.utils module¶
Disambiguation utils.
Module contents¶
Disambiguation module.