EMMAA Model (emmaa.model)

class emmaa.model.EmmaaModel(name, config, paper_ids=None)[source]

Bases: object

Represents an EMMAA model.

Parameters
  • name (str) – The name of the model.

  • config (dict) – A configuration dict that is typically loaded from a YAML file.

  • paper_ids (list(str) or None) – A list of paper IDs used to get statements for the current state of the model. With new reading results, new paper IDs will be added. If not provided, initial set will be derived from existing statements.

stmts

A list of EmmaaStatement objects representing the model

Type

list[emmaa.EmmaaStatement]

assembly_config

Configurations for assembling the model.

Type

dict

test_config

Configurations for running tests on the model.

Type

dict

reading_config

Configurations for reading the content.

Type

dict

query_config

Configurations for running queries on the model.

Type

dict

search_terms

A list of SearchTerm objects containing the search terms used in the model.

Type

list[emmaa.priors.SearchTerm]

ndex_network

The identifier of the NDEx network corresponding to the model.

Type

str

assembled_stmts

A list of assembled INDRA Statements

Type

list[indra.statements.Statement]

add_paper_ids(initial_ids, id_type='pmid')[source]

Convert if needed and save paper IDs.

Parameters
  • initial_ids (set(str)) – A set of paper IDs.

  • id_type (str) – What type the given IDs are (e.g. pmid, doi, pii). All IDs except for PIIs will be converted into TextRef IDs before saving.

add_statements(stmts)[source]

Add a set of EMMAA Statements to the model

Parameters

stmts (list[emmaa.EmmaaStatement]) – A list of EMMAA Statements to add to the model

assemble_dynamic_pysb(mode='local', bucket='emmaa')[source]

Assemble a version of a PySB model for dynamic simulation.

assemble_pybel(mode='local', bucket='emmaa')[source]

Assemble the model into PyBEL and return the assembled model.

assemble_pysb(mode='local', bucket='emmaa')[source]

Assemble the model into PySB and return the assembled model.

assemble_signed_graph(mode='local', bucket='emmaa')[source]

Assemble the model into signed graph and return the assembled graph.

assemble_unsigned_graph(**kwargs)[source]

Assemble the model into unsigned graph and return the assembled graph.

eliminate_copies()[source]

Filter out exact copies of the same Statement.

extend_unique(estmts)[source]

Extend model statements only if it is not already there.

get_assembled_entities()[source]

Return a list of Agent objects that the assembled model contains.

get_entities()[source]

Return a list of Agent objects that the model contains.

get_indra_stmts()[source]

Return the INDRA Statements contained in the model.

Returns

The list of INDRA Statements that are extracted from the EMMAA Statements.

Return type

list[indra.statements.Statement]

get_new_readings(date_limit=10)[source]

Search new literature, read, and add to model statements

get_paper_ids_from_stmts(stmts)[source]

Get initial set of paper IDs from a list of statements.

Parameters

stmts (list[emmaa.statements.EmmaaStatement]) – A list of EMMAA statements to create the mappings from.

classmethod load_from_s3(model_name, bucket='emmaa')[source]

Load the latest model state from S3.

Parameters

model_name (str) – Name of model to load. This function expects the latest model to be found on S3 in the emmaa bucket with key ‘models/{model_name}/model_{date_string}’, and the model config file at ‘models/{model_name}/config.json’.

Returns

Latest instance of EmmaaModel with the given name, loaded from S3.

Return type

emmaa.model.EmmaaModel

run_assembly()[source]

Run INDRA’s assembly pipeline on the Statements.

save_to_s3(bucket='emmaa')[source]

Dump the model state to S3.

static search_biorxiv(collection_id, date_limit)[source]

Search BioRxiv within date_limit.

Parameters
  • date_limit (int) – The number of days to search back from today.

  • collection_id (str) – ID of a collection to search BioArxiv for.

Returns

terms_to_dois – A dict representing biorxiv collection ID as key and DOIs returned by search as values.

Return type

dict

static search_elsevier(search_terms, date_limit)[source]

Search Elsevier for given search terms.

Parameters
  • search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.

  • date_limit (int) – The number of days to search back from today.

Returns

terms_to_piis – A dict representing given search terms as keys and PIIs returned by searches as values.

Return type

dict

search_literature(lit_source, date_limit=None)[source]

Search for the model’s search terms in the literature.

Parameters

date_limit (Optional[int]) – The number of days to search back from today.

Returns

ids_to_terms – A dict representing all the literature source IDs (e.g., PMIDs or PIIS) returned by the searches as keys, and the search terms for which the given ID was produced as values.

Return type

dict

static search_pubmed(search_terms, date_limit)[source]

Search PubMed for given search terms.

Parameters
  • search_terms (list[emmaa.priors.SearchTerm]) – A list of SearchTerm objects to search PubMed for.

  • date_limit (int) – The number of days to search back from today.

Returns

terms_to_pmids – A dict representing given search terms as keys and PMIDs returned by searches as values.

Return type

dict

to_json()[source]

Convert the model into a json dumpable dictionary

update_from_disease_map(disease_map_config)[source]

Update model by processing MINERVA Disease Map.

Relevant part of reading config should look similar to:

{“disease_map”: {

“map_name”: “covid19map”, “filenames” : “all”, # or a list of filenames “metadata”: {

“internal”: true }

}

}

update_from_files(files_config)[source]

Add custom statements from files.

Relevant part of reading config should look similar to:

{“other_files”: [
{

“bucket”: “indra-covid19”, “filename”: “ctd_stmts.pkl”, “metadata”: {“internal”: true, “curated”: true}

}

]

update_to_ndex()[source]

Update assembled model as CX on NDEx, updates existing network.

update_with_cord19(cord19_config)[source]

Update model with new CORD19 dataset statements.

Relevant part of reading config should look similar to:

{“cord19_update”: {
“metadata”: {

“internal”: true, “curated”: false },

“date_limit”: 5 }

}

upload_to_ndex()[source]

Upload the assembled model as CX to NDEx, creates new network.

emmaa.model.get_assembled_statements(model, date=None, bucket='emmaa')[source]

Load and return a list of assembled statements.

Parameters
  • model (str) – A name of a model.

  • date (str or None) – Date in “YYYY-MM-DD” format for which to load the statements. If None, loads the latest available statements.

  • bucket (str) – Name of S3 bucket to look for a file. Defaults to ‘emmaa’.

Returns

  • stmts (list[indra.statements.Statement]) – A list of assembled statements.

  • latest_file_key (str) – Key of a file with statements on s3.

emmaa.model.get_model_stats(model, mode, tests=None, date=None, extension='.json', n=0, bucket='emmaa')[source]

Gets the latest statistics for the given model

Parameters
  • model (str) – Model name to look for

  • mode (str) – Type of stats to generate (model or test)

  • tests (str) – A name of a test corpus. Default is large_corpus_tests.

  • date (str or None) – Date for which the stats will be returned in “YYYY-MM-DD” format.

  • extension (str) – Extension of the file.

  • n (int) – Index of the file in list of S3 files sorted by date (0-indexed).

  • bucket (str) – Name of bucket on S3.

Returns

model_data – The json formatted data containing the statistics for the model

Return type

json

emmaa.model.get_models(include_config=False, include_dev=False, config_load_func=<function load_config_from_s3>, bucket='emmaa')[source]

Get a list of all models in the EMMAA bucket.

Parameters
  • include_config (bool) – Whether to include the config file for each model.

  • include_dev (bool) – Whether to include the models in dev mode.

  • config_load_func (function) – A function to load the config file (e.g. from s3 or from cache).

  • bucket (str) – Name of S3 bucket to look for a file. Defaults to ‘emmaa’.

Returns

model_data – A list of model names. If include_config is True, the list is a list of tuples of model names and configs.

Return type

list[str] or list(tuple(str, dict))

emmaa.model.last_updated_date(model, file_type='model', date_format='date', tests='large_corpus_tests', extension='.pkl', n=0, bucket='emmaa')[source]

Find the most recent or the nth file of given type on S3 and return its creation date.

Example file name: models/aml/model_2018-12-13-18-11-54.pkl

Parameters
  • model (str) – Model name to look for

  • file_type (str) – Type of a file to find the latest file for. Accepted values: ‘model’, ‘test_results’, ‘model_stats’, ‘test_stats’.

  • date_format (str) – Format of the returned date. Accepted values are ‘datetime’ (returns a date in the format “YYYY-MM-DD-HH-mm-ss”) and ‘date’ (returns a date in the format “YYYY-MM-DD”). Default is ‘date’.

  • extension (str) – The extension the model file needs to have. Default is ‘.pkl’

  • n (int) – Index of the file in list of S3 files sorted by date (0-indexed).

  • bucket (str) – Name of bucket on S3.

Returns

last_updated – A string of the selected format.

Return type

str

emmaa.model.load_config_from_s3(model_name, bucket='emmaa')[source]

Return a JSON dict of config settings for a model from S3.

Parameters

model_name (str) – The name of the model whose config should be loaded.

Returns

config – A JSON dictionary of the model configuration loaded from S3.

Return type

dict

emmaa.model.load_extra_evidence(stmts, method='db_query', ev_limit=1000, batch_size=3000)[source]

Load additional evidence for statements from database.

Parameters
  • stmts (list[indra.statements.Statement]) – A list of statements to load evidence for.

  • method (str) – What method to use to load evidence (accepted values: db_query and rest_api). Default: db_query.

  • ev_limit (Optional[int]) – How many evidences to load from the database for each statement. Default: 1000.

  • batch_size (Optional[int]) – Batch size used for querying. Default: 3000.

Returns

stmts – A list of statements with additional evidence.

Return type

list[indra.statements.Statement]

emmaa.model.load_stmts_from_s3(model_name, bucket='emmaa')[source]

Return the list of EMMAA Statements constituting the latest model.

Parameters

model_name (str) – The name of the model whose config should be loaded.

Returns

stmts – The list of EMMAA Statements in the latest model version.

Return type

list of emmaa.statements.EmmaaStatement

emmaa.model.pysb_to_gromet(pysb_model, model_name, statements=None, fname=None)[source]

Convert PySB model to GroMEt object and save it to a JSON file.

Parameters
  • pysb_model (pysb.Model) – PySB model object.

  • model_name (str) – A name of EMMAA model.

  • statements (Optional[list[indra.statements.Statement]]) – A list of INDRA Statements a PySB model was assembled from. If provided the statement hashes will be propagated into GroMEt metadata.

  • fname (Optional[str]) – If given, the GroMEt will be dumped into JSON file.

Returns

g – A GroMEt object built from PySB model.

Return type

automates.script.gromet.gromet.Gromet

emmaa.model.save_config_to_s3(model_name, config, bucket='emmaa')[source]

Upload config settings for a model to S3.

Parameters
  • model_name (str) – The name of the model whose config should be saved to S3.

  • config (dict) – A JSON dict of configurations for the model.