EMMAA Model (emmaa.model
)¶
- class emmaa.model.EmmaaModel(name, config, paper_ids=None)[source]¶
Bases:
object
Represents an EMMAA model.
- Parameters
name (str) – The name of the model.
config (dict) – A configuration dict that is typically loaded from a YAML file.
paper_ids (list(str) or None) – A list of paper IDs used to get statements for the current state of the model. With new reading results, new paper IDs will be added. If not provided, initial set will be derived from existing statements.
- search_terms¶
A list of SearchTerm objects containing the search terms used in the model.
- Type
- add_statements(stmts)[source]¶
Add a set of EMMAA Statements to the model
- Parameters
stmts (list[emmaa.EmmaaStatement]) – A list of EMMAA Statements to add to the model
- assemble_dynamic_pysb(mode='local', bucket='emmaa')[source]¶
Assemble a version of a PySB model for dynamic simulation.
- assemble_pybel(mode='local', bucket='emmaa')[source]¶
Assemble the model into PyBEL and return the assembled model.
- assemble_pysb(mode='local', bucket='emmaa')[source]¶
Assemble the model into PySB and return the assembled model.
- assemble_signed_graph(mode='local', bucket='emmaa')[source]¶
Assemble the model into signed graph and return the assembled graph.
- assemble_unsigned_graph(**kwargs)[source]¶
Assemble the model into unsigned graph and return the assembled graph.
- get_indra_stmts()[source]¶
Return the INDRA Statements contained in the model.
- Returns
The list of INDRA Statements that are extracted from the EMMAA Statements.
- Return type
list[indra.statements.Statement]
- get_paper_ids_from_stmts(stmts)[source]¶
Get initial set of paper IDs from a list of statements.
- Parameters
stmts (list[emmaa.statements.EmmaaStatement]) – A list of EMMAA statements to create the mappings from.
- classmethod load_from_s3(model_name, bucket='emmaa')[source]¶
Load the latest model state from S3.
- Parameters
model_name (str) – Name of model to load. This function expects the latest model to be found on S3 in the emmaa bucket with key ‘models/{model_name}/model_{date_string}’, and the model config file at ‘models/{model_name}/config.json’.
- Returns
Latest instance of EmmaaModel with the given name, loaded from S3.
- Return type
- static search_elsevier(search_terms, date_limit)[source]¶
Search Elsevier for given search terms.
- Parameters
search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
date_limit (int) – The number of days to search back from today.
- Returns
terms_to_piis – A dict representing given search terms as keys and PIIs returned by searches as values.
- Return type
- search_literature(lit_source, date_limit=None)[source]¶
Search for the model’s search terms in the literature.
- static search_pubmed(search_terms, date_limit)[source]¶
Search PubMed for given search terms.
- Parameters
search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
date_limit (int) – The number of days to search back from today.
- Returns
terms_to_pmids – A dict representing given search terms as keys and PMIDs returned by searches as values.
- Return type
- update_from_disease_map(disease_map_config)[source]¶
Update model by processing MINERVA Disease Map.
Relevant part of reading config should look similar to:
- {“disease_map”: {
“map_name”: “covid19map”, “filenames” : “all”, # or a list of filenames “metadata”: {
“internal”: true }
}
}
- update_from_files(files_config)[source]¶
Add custom statements from files.
Relevant part of reading config should look similar to:
- {“other_files”: [
- {
“bucket”: “indra-covid19”, “filename”: “ctd_stmts.pkl”, “metadata”: {“internal”: true, “curated”: true}
}
]¶
- emmaa.model.get_assembled_statements(model, date=None, bucket='emmaa')[source]¶
Load and return a list of assembled statements.
- Parameters
- Returns
stmts (list[indra.statements.Statement]) – A list of assembled statements.
latest_file_key (str) – Key of a file with statements on s3.
- emmaa.model.get_model_stats(model, mode, tests=None, date=None, extension='.json', n=0, bucket='emmaa')[source]¶
Gets the latest statistics for the given model
- Parameters
model (str) – Model name to look for
mode (str) – Type of stats to generate (model or test)
tests (str) – A name of a test corpus. Default is large_corpus_tests.
date (str or None) – Date for which the stats will be returned in “YYYY-MM-DD” format.
extension (str) – Extension of the file.
n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
bucket (str) – Name of bucket on S3.
- Returns
model_data – The json formatted data containing the statistics for the model
- Return type
json
- emmaa.model.get_models(include_config=False, include_dev=False, config_load_func=<function load_config_from_s3>, bucket='emmaa')[source]¶
Get a list of all models in the EMMAA bucket.
- Parameters
include_config (bool) – Whether to include the config file for each model.
include_dev (bool) – Whether to include the models in dev mode.
config_load_func (function) – A function to load the config file (e.g. from s3 or from cache).
bucket (str) – Name of S3 bucket to look for a file. Defaults to ‘emmaa’.
- Returns
model_data – A list of model names. If include_config is True, the list is a list of tuples of model names and configs.
- Return type
- emmaa.model.last_updated_date(model, file_type='model', date_format='date', tests='large_corpus_tests', extension='.pkl', n=0, bucket='emmaa')[source]¶
Find the most recent or the nth file of given type on S3 and return its creation date.
Example file name: models/aml/model_2018-12-13-18-11-54.pkl
- Parameters
model (str) – Model name to look for
file_type (str) – Type of a file to find the latest file for. Accepted values: ‘model’, ‘test_results’, ‘model_stats’, ‘test_stats’.
date_format (str) – Format of the returned date. Accepted values are ‘datetime’ (returns a date in the format “YYYY-MM-DD-HH-mm-ss”) and ‘date’ (returns a date in the format “YYYY-MM-DD”). Default is ‘date’.
extension (str) – The extension the model file needs to have. Default is ‘.pkl’
n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
bucket (str) – Name of bucket on S3.
- Returns
last_updated – A string of the selected format.
- Return type
- emmaa.model.load_config_from_s3(model_name, bucket='emmaa')[source]¶
Return a JSON dict of config settings for a model from S3.
- emmaa.model.load_extra_evidence(stmts, method='db_query', ev_limit=1000, batch_size=3000)[source]¶
Load additional evidence for statements from database.
- Parameters
stmts (list[indra.statements.Statement]) – A list of statements to load evidence for.
method (str) – What method to use to load evidence (accepted values: db_query and rest_api). Default: db_query.
ev_limit (Optional[int]) – How many evidences to load from the database for each statement. Default: 1000.
batch_size (Optional[int]) – Batch size used for querying. Default: 3000.
- Returns
stmts – A list of statements with additional evidence.
- Return type
list[indra.statements.Statement]
- emmaa.model.load_stmts_from_s3(model_name, bucket='emmaa')[source]¶
Return the list of EMMAA Statements constituting the latest model.
- Parameters
model_name (str) – The name of the model whose config should be loaded.
- Returns
stmts – The list of EMMAA Statements in the latest model version.
- Return type
- emmaa.model.pysb_to_gromet(pysb_model, model_name, statements=None, fname=None)[source]¶
Convert PySB model to GroMEt object and save it to a JSON file.
- Parameters
pysb_model (pysb.Model) – PySB model object.
model_name (str) – A name of EMMAA model.
statements (Optional[list[indra.statements.Statement]]) – A list of INDRA Statements a PySB model was assembled from. If provided the statement hashes will be propagated into GroMEt metadata.
fname (Optional[str]) – If given, the GroMEt will be dumped into JSON file.
- Returns
g – A GroMEt object built from PySB model.
- Return type
automates.script.gromet.gromet.Gromet