EMMAA Model (emmaa.model)

class emmaa.model.EmmaaModel(name, config, paper_ids=None)[source]

Bases: object

Represents an EMMAA model.

Parameters:
  • name (str) – The name of the model.
  • config (dict) – A configuration dict that is typically loaded from a YAML file.
  • paper_ids (list(str) or None) – A list of paper IDs used to get statements for the current state of the model. With new reading results, new paper IDs will be added. If not provided, initial set will be derived from existing statements.
stmts

A list of EmmaaStatement objects representing the model

Type:list[emmaa.EmmaaStatement]
assembly_config

Configurations for assembling the model.

Type:dict
test_config

Configurations for running tests on the model.

Type:dict
reading_config

Configurations for reading the content.

Type:dict
query_config

Configurations for running queries on the model.

Type:dict
search_terms

A list of SearchTerm objects containing the search terms used in the model.

Type:list[emmaa.priors.SearchTerm]
ndex_network

The identifier of the NDEx network corresponding to the model.

Type:str
assembled_stmts

A list of assembled INDRA Statements

Type:list[indra.statements.Statement]
add_paper_ids(initial_ids, id_type='pmid')[source]

Convert if needed and save paper IDs.

Parameters:
  • initial_ids (set(str)) – A set of paper IDs.
  • id_type (str) – What type the given IDs are (e.g. pmid, doi, pii). All IDs except for PIIs will be converted into TextRef IDs before saving.
add_statements(stmts)[source]

Add a set of EMMAA Statements to the model

Parameters:stmts (list[emmaa.EmmaaStatement]) – A list of EMMAA Statements to add to the model
assemble_dynamic_pysb(**kwargs)[source]

Assemble a version of a PySB model for dynamic simulation.

assemble_pybel(mode='local', bucket='emmaa')[source]

Assemble the model into PyBEL and return the assembled model.

assemble_pysb(mode='local', bucket='emmaa')[source]

Assemble the model into PySB and return the assembled model.

assemble_signed_graph(mode='local', bucket='emmaa')[source]

Assemble the model into signed graph and return the assembled graph.

assemble_unsigned_graph(**kwargs)[source]

Assemble the model into unsigned graph and return the assembled graph.

eliminate_copies()[source]

Filter out exact copies of the same Statement.

extend_unique(estmts)[source]

Extend model statements only if it is not already there.

get_assembled_entities()[source]

Return a list of Agent objects that the assembled model contains.

get_entities()[source]

Return a list of Agent objects that the model contains.

get_indra_stmts()[source]

Return the INDRA Statements contained in the model.

Returns:The list of INDRA Statements that are extracted from the EMMAA Statements.
Return type:list[indra.statements.Statement]
get_new_readings(date_limit=10)[source]

Search new literature, read, and add to model statements

get_paper_ids_from_stmts(stmts)[source]

Get initial set of paper IDs from a list of statements.

Parameters:stmts (list[emmaa.statements.EmmaaStatement]) – A list of EMMAA statements to create the mappings from.
classmethod load_from_s3(model_name, bucket='emmaa')[source]

Load the latest model state from S3.

Parameters:model_name (str) – Name of model to load. This function expects the latest model to be found on S3 in the emmaa bucket with key ‘models/{model_name}/model_{date_string}’, and the model config file at ‘models/{model_name}/config.json’.
Returns:Latest instance of EmmaaModel with the given name, loaded from S3.
Return type:emmaa.model.EmmaaModel
run_assembly()[source]

Run INDRA’s assembly pipeline on the Statements.

save_to_s3(bucket='emmaa')[source]

Dump the model state to S3.

static search_biorxiv(collection_id, date_limit)[source]

Search BioRxiv within date_limit.

Parameters:
  • date_limit (int) – The number of days to search back from today.
  • collection_id (str) – ID of a collection to search BioArxiv for.
Returns:

terms_to_dois – A dict representing biorxiv collection ID as key and DOIs returned by search as values.

Return type:

dict

static search_elsevier(search_terms, date_limit)[source]

Search Elsevier for given search terms.

Parameters:
  • search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
  • date_limit (int) – The number of days to search back from today.
Returns:

terms_to_piis – A dict representing given search terms as keys and PIIs returned by searches as values.

Return type:

dict

search_literature(lit_source, date_limit=None)[source]

Search for the model’s search terms in the literature.

Parameters:date_limit (Optional[int]) – The number of days to search back from today.
Returns:ids_to_terms – A dict representing all the literature source IDs (e.g., PMIDs or PIIS) returned by the searches as keys, and the search terms for which the given ID was produced as values.
Return type:dict
static search_pubmed(search_terms, date_limit)[source]

Search PubMed for given search terms.

Parameters:
  • search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.
  • date_limit (int) – The number of days to search back from today.
Returns:

terms_to_pmids – A dict representing given search terms as keys and PMIDs returned by searches as values.

Return type:

dict

to_json()[source]

Convert the model into a json dumpable dictionary

update_from_disease_map(disease_map_config)[source]

Update model by processing MINERVA Disease Map.

Relevant part of reading config should look similar to:

{“disease_map”: {

“map_name”: “covid19map”, “filenames” : “all”, # or a list of filenames “metadata”: {

“internal”: true }

}

}

update_from_files(files_config)[source]

Add custom statements from files.

Relevant part of reading config should look similar to:

{“other_files”: [
{
“bucket”: “indra-covid19”, “filename”: “ctd_stmts.pkl”, “metadata”: {“internal”: true, “curated”: true}

}

update_to_ndex()[source]

Update assembled model as CX on NDEx, updates existing network.

update_with_cord19(cord19_config)[source]

Update model with new CORD19 dataset statements.

Relevant part of reading config should look similar to:

{“cord19_update”: {
“metadata”: {
“internal”: true, “curated”: false },

“date_limit”: 5 }

}

upload_to_ndex()[source]

Upload the assembled model as CX to NDEx, creates new network.

emmaa.model.get_assembled_statements(model, date=None, bucket='emmaa')[source]

Load and return a list of assembled statements.

Parameters:
  • model (str) – A name of a model.
  • date (str or None) – Date in “YYYY-MM-DD” format for which to load the statements. If None, loads the latest available statements.
  • bucket (str) – Name of S3 bucket to look for a file. Defaults to ‘emmaa’.
Returns:

  • stmts (list[indra.statements.Statement]) – A list of assembled statements.
  • latest_file_key (str) – Key of a file with statements on s3.

emmaa.model.get_model_stats(model, mode, tests=None, date=None, extension='.json', n=0, bucket='emmaa')[source]

Gets the latest statistics for the given model

Parameters:
  • model (str) – Model name to look for
  • mode (str) – Type of stats to generate (model or test)
  • tests (str) – A name of a test corpus. Default is large_corpus_tests.
  • date (str or None) – Date for which the stats will be returned in “YYYY-MM-DD” format.
  • extension (str) – Extension of the file.
  • n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
  • bucket (str) – Name of bucket on S3.
Returns:

model_data – The json formatted data containing the statistics for the model

Return type:

json

emmaa.model.last_updated_date(model, file_type='model', date_format='date', tests='large_corpus_tests', extension='.pkl', n=0, bucket='emmaa')[source]

Find the most recent or the nth file of given type on S3 and return its creation date.

Example file name: models/aml/model_2018-12-13-18-11-54.pkl

Parameters:
  • model (str) – Model name to look for
  • file_type (str) – Type of a file to find the latest file for. Accepted values: ‘model’, ‘test_results’, ‘model_stats’, ‘test_stats’.
  • date_format (str) – Format of the returned date. Accepted values are ‘datetime’ (returns a date in the format “YYYY-MM-DD-HH-mm-ss”) and ‘date’ (returns a date in the format “YYYY-MM-DD”). Default is ‘date’.
  • extension (str) – The extension the model file needs to have. Default is ‘.pkl’
  • n (int) – Index of the file in list of S3 files sorted by date (0-indexed).
  • bucket (str) – Name of bucket on S3.
Returns:

last_updated – A string of the selected format.

Return type:

str

emmaa.model.load_config_from_s3(model_name, bucket='emmaa')[source]

Return a JSON dict of config settings for a model from S3.

Parameters:model_name (str) – The name of the model whose config should be loaded.
Returns:config – A JSON dictionary of the model configuration loaded from S3.
Return type:dict
emmaa.model.load_stmts_from_s3(model_name, bucket='emmaa')[source]

Return the list of EMMAA Statements constituting the latest model.

Parameters:model_name (str) – The name of the model whose config should be loaded.
Returns:stmts – The list of EMMAA Statements in the latest model version.
Return type:list of emmaa.statements.EmmaaStatement
emmaa.model.save_config_to_s3(model_name, config, bucket='emmaa')[source]

Upload config settings for a model to S3.

Parameters:
  • model_name (str) – The name of the model whose config should be saved to S3.
  • config (dict) – A JSON dict of configurations for the model.