inspirehep.modules.disambiguation package

Submodules

inspirehep.modules.disambiguation.api module

Disambiguation API.

inspirehep.modules.disambiguation.api.save_curated_signatures_and_input_clusters()[source]

Save curated signatures and input clusters to disk.

Saves two files to disk called (by default) input_clusters.jsonl and curated_signatures.jsonl. The former contains one line per each cluster initially present in INSPIRE, while the latter contains one line per each curated signature that will be used as ground truth by BEARD.

inspirehep.modules.disambiguation.api.save_publications()[source]

Save publications to disk.

Saves a file to disk called (by default) publications.jsonl, which contains one line per record in INSPIRE with information that will be useful for BEARD during training and prediction.

inspirehep.modules.disambiguation.api.save_sampled_pairs()[source]

Save sampled signature pairs to disk.

Save a file to disk called (by default) sampled_pairs.jsonl, which contains one line per each pair of signatures sampled from INSPIRE that will be used by BEARD during training.

inspirehep.modules.disambiguation.api.train_and_save_distance_model()[source]

Train the distance estimator model and save it to disk.

inspirehep.modules.disambiguation.api.train_and_save_ethnicity_model()[source]

Train the ethnicity estimator model and save it to disk.

inspirehep.modules.disambiguation.config module

Disambiguation configuration.

inspirehep.modules.disambiguation.config.DISAMBIGUATION_SAMPLED_PAIRS_SIZE = 1200000

The number of signature pairs we use during training.

Since INSPIRE has ~3M curated signatures it would take too much time to train on all possible pairs, so we sample ~1M pairs in such a way that they are representative of the known clusters structure.

Note

It MUST be a multiple of 12 for the reason explained in inspirehep.modules.disambiguation.core.ml.sampling.

inspirehep.modules.disambiguation.ext module

Disambiguation extension.

class inspirehep.modules.disambiguation.ext.InspireDisambiguation(app=None)[source]

Bases: object

init_app(app)[source]
init_config(app)[source]

inspirehep.modules.disambiguation.utils module

Disambiguation utils.

inspirehep.modules.disambiguation.utils.open_file_in_folder(*args, **kwds)[source]

Open a file in a folder, creating the folder if it does not exist.

Module contents

Disambiguation module.