EMMAA Model (emmaa.model
)¶
-
class
emmaa.model.
EmmaaModel
(name, config, paper_ids=None)[source]¶ Bases:
object
Represents an EMMAA model.
Parameters: - name (str) – The name of the model.
- config (dict) – A configuration dict that is typically loaded from a YAML file.
- paper_ids (list(str) or None) – A list of paper IDs used to get statements for the current state of the model. With new reading results, new paper IDs will be added. If not provided, initial set will be derived from existing statements.
-
search_terms
¶ A list of SearchTerm objects containing the search terms used in the model.
Type: list[emmaa.priors.SearchTerm]
-
add_paper_ids
(initial_ids, id_type='pmid')[source]¶ Convert if needed and save paper IDs.
Parameters:
-
add_statements
(stmts)[source]¶ Add a set of EMMAA Statements to the model
Parameters: stmts (list[emmaa.EmmaaStatement]) – A list of EMMAA Statements to add to the model
-
assemble_pybel
(mode='local', bucket='emmaa')[source]¶ Assemble the model into PyBEL and return the assembled model.
-
assemble_pysb
(mode='local', bucket='emmaa')[source]¶ Assemble the model into PySB and return the assembled model.
-
assemble_signed_graph
(mode='local', bucket='emmaa')[source]¶ Assemble the model into signed graph and return the assembled graph.
-
assemble_unsigned_graph
(**kwargs)[source]¶ Assemble the model into unsigned graph and return the assembled graph.
-
get_indra_stmts
()[source]¶ Return the INDRA Statements contained in the model.
Returns: The list of INDRA Statements that are extracted from the EMMAA Statements. Return type: list[indra.statements.Statement]
-
get_paper_ids_from_stmts
(stmts)[source]¶ Get initial set of paper IDs from a list of statements.
Parameters: stmts (list[emmaa.statements.EmmaaStatement]) – A list of EMMAA statements to create the mappings from.
-
classmethod
load_from_s3
(model_name, bucket='emmaa')[source]¶ Load the latest model state from S3.
Parameters: model_name (str) – Name of model to load. This function expects the latest model to be found on S3 in the emmaa bucket with key ‘models/{model_name}/model_{date_string}’, and the model config file at ‘models/{model_name}/config.json’. Returns: Latest instance of EmmaaModel with the given name, loaded from S3. Return type: emmaa.model.EmmaaModel
-
static
search_biorxiv
(collection_id, date_limit)[source]¶ Search BioRxiv within date_limit.
Parameters: Returns: terms_to_dois – A dict representing biorxiv collection ID as key and DOIs returned by search as values.
Return type:
-
static
search_elsevier
(search_terms, date_limit)[source]¶ Search Elsevier for given search terms.
Parameters: - search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
- date_limit (int) – The number of days to search back from today.
Returns: terms_to_piis – A dict representing given search terms as keys and PIIs returned by searches as values.
Return type:
-
search_literature
(lit_source, date_limit=None)[source]¶ Search for the model’s search terms in the literature.
Parameters: date_limit (Optional[int]) – The number of days to search back from today. Returns: ids_to_terms – A dict representing all the literature source IDs (e.g., PMIDs or PIIS) returned by the searches as keys, and the search terms for which the given ID was produced as values. Return type: dict
-
static
search_pubmed
(search_terms, date_limit)[source]¶ Search PubMed for given search terms.
Parameters: - search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
- date_limit (int) – The number of days to search back from today.
Returns: terms_to_pmids – A dict representing given search terms as keys and PMIDs returned by searches as values.
Return type:
-
update_from_disease_map
(disease_map_config)[source]¶ Update model by processing MINERVA Disease Map.
Relevant part of reading config should look similar to:
- {“disease_map”: {
“map_name”: “covid19map”, “filenames” : “all”, # or a list of filenames “metadata”: {
“internal”: true }}
}
-
update_from_files
(files_config)[source]¶ Add custom statements from files.
Relevant part of reading config should look similar to:
- {“other_files”: [
- {
- “bucket”: “indra-covid19”, “filename”: “ctd_stmts.pkl”, “metadata”: {“internal”: true, “curated”: true}
}
-
emmaa.model.
get_assembled_statements
(model, date=None, bucket='emmaa')[source]¶ Load and return a list of assembled statements.
Parameters: Returns: - stmts (list[indra.statements.Statement]) – A list of assembled statements.
- latest_file_key (str) – Key of a file with statements on s3.
-
emmaa.model.
get_model_stats
(model, mode, tests=None, date=None, extension='.json', n=0, bucket='emmaa')[source]¶ Gets the latest statistics for the given model
Parameters: - model (str) – Model name to look for
- mode (str) – Type of stats to generate (model or test)
- tests (str) – A name of a test corpus. Default is large_corpus_tests.
- date (str or None) – Date for which the stats will be returned in “YYYY-MM-DD” format.
- extension (str) – Extension of the file.
- n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
- bucket (str) – Name of bucket on S3.
Returns: model_data – The json formatted data containing the statistics for the model
Return type: json
-
emmaa.model.
last_updated_date
(model, file_type='model', date_format='date', tests='large_corpus_tests', extension='.pkl', n=0, bucket='emmaa')[source]¶ Find the most recent or the nth file of given type on S3 and return its creation date.
Example file name: models/aml/model_2018-12-13-18-11-54.pkl
Parameters: - model (str) – Model name to look for
- file_type (str) – Type of a file to find the latest file for. Accepted values: ‘model’, ‘test_results’, ‘model_stats’, ‘test_stats’.
- date_format (str) – Format of the returned date. Accepted values are ‘datetime’ (returns a date in the format “YYYY-MM-DD-HH-mm-ss”) and ‘date’ (returns a date in the format “YYYY-MM-DD”). Default is ‘date’.
- extension (str) – The extension the model file needs to have. Default is ‘.pkl’
- n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
- bucket (str) – Name of bucket on S3.
Returns: last_updated – A string of the selected format.
Return type:
-
emmaa.model.
load_config_from_s3
(model_name, bucket='emmaa')[source]¶ Return a JSON dict of config settings for a model from S3.
Parameters: model_name (str) – The name of the model whose config should be loaded. Returns: config – A JSON dictionary of the model configuration loaded from S3. Return type: dict
-
emmaa.model.
load_stmts_from_s3
(model_name, bucket='emmaa')[source]¶ Return the list of EMMAA Statements constituting the latest model.
Parameters: model_name (str) – The name of the model whose config should be loaded. Returns: stmts – The list of EMMAA Statements in the latest model version. Return type: list of emmaa.statements.EmmaaStatement