Priors (emmaa.priors)

This module contains classes to generate prior networks.

class emmaa.priors.SearchTerm(type, name, db_refs, search_term)[source]

Bases: object

Represents a search term to be used in a model configuration.

  • type (str) – The type of search term, e.g. gene, bioprocess, other

  • name (str) – The name of the search term, is equivalent to an Agent name

  • db_refs (dict) – A dict of database references for the given term, is similar to an Agent db_refs dict

  • search_term (str) – The actual search term to us for searching PubMed

classmethod from_json(jd)[source]

Return a SearchTerm object from JSON.


Return search term as JSON.

emmaa.priors.get_drugs_for_gene(stmts, hgnc_id)[source]

Get list of drugs that target a gene

  • stmts (list of indra.statements.Statement) – List of INDRA statements with a drug as subject

  • hgnc_id (str) – HGNC id for a gene


drugs_for_gene – List of search terms for drugs targeting the input gene

Return type

list of emmaa.priors.SearchTerm

Literature Prior (emmaa.priors.literature_prior)

This module implements the LiteraturePrior class which automates some of the steps involved in starting a model around a set of literature searches. Example:

lp = LiteraturePrior('some_disease', 'Some Disease',
                     'This is a self-updating model of Some Disease',
                     search_strings=['some disease'],
estmts = lp.get_statements()
model = lp.make_model(estmts, upload_to_s3=True)
emmaa.priors.literature_prior.get_raw_statements_for_pmids(pmids, mode='all', batch_size=100)[source]

Return EmmaaStatements based on extractions from given PMIDs.

  • pmids (set or list of str) – A set of PMIDs to find raw INDRA Statements for in the INDRA DB.

  • mode ('all' or 'distilled') – The ‘distilled’ mode makes sure that the “best”, non-redundant set of raw statements are found across potentially redundant text contents and reader versions. The ‘all’ mode doesn’t do such distillation but is significantly faster.

  • batch_size (Optional[int]) – Determines how many PMIDs to fetch statements for in each iteration. Default: 100.


A dict keyed by PMID with values INDRA Statements obtained from the given PMID.

Return type


emmaa.priors.literature_prior.make_search_terms(search_strings, mesh_ids)[source]

Return EMMAA SearchTerms based on search strings and MeSH IDs.

  • search_strings (List[str]) – A list of search strings e.g., “diabetes” to find papers in the literature.

  • mesh_ids (List[str]) – A list of MeSH IDs that are used to search the literature as headings associated with papers.

Return type



A list of EMMAA SearchTerm objects constructed from the search strings and the MeSH IDs.

TCGA Cancer Prior (emmaa.priors.cancer_prior)

class emmaa.priors.cancer_prior.TcgaCancerPrior(tcga_study_prefix, sif_prior, diffusion_service=None, mutation_cache=None)[source]

Bases: object

Prior network generation using TCGA mutations for a given cancer type.

This class implements building a prior network using a generic underlying prior, and TCGA data for a specific cancer type. Mutations for the given cancer type are extracted from TCGA studies and heat diffusion from the corresponding nodes in the prior is used to identify a set of relevant nodes.

static find_drugs_for_genes(node_list)[source]

Return list of drugs targeting gene nodes.


Return dict of gene mutation frequencies based on TCGA studies.


Return a list of the relevant nodes in the prior.

Heat diffusion is applied to the prior network based on initial heat on nodes that are mutated according to patient statistics.

load_sif_prior(fname, e50=20)[source]

Return a Graph based on a SIF file describing a prior.

  • fname (str) – Path to the SIF file.

  • e50 (int) – Parameter for converting evidence counts into weights over the interval [0, 1) according to hyperbolic function weight = (count / (count + e50)).


Run the prior node list generation and return relevant nodes.

static search_terms_from_nodes(node_list)[source]

Build a list of Pubmed search terms from the nodes returned by make_prior.

Gene List Prior (emmaa.priors.gene_list_prior)

class emmaa.priors.gene_list_prior.GeneListPrior(gene_list, name, human_readable_name, description=None)[source]

Bases: object

Class to manage the construction of a model from a list of genes.

  • gene_list (list[str]) – A list of HGNC gene symbols

  • name (str) – The name of the model (all lower case, no spaces or special characters)

  • human_readable_name (str) – The human readable name (display name) of the model


Generate a configuration based on attributes.


Generate Statements from the gene list.

Return type



Make an EmmaaModel and upload it along with the config to S3.


Generate search terms from the gene list.


Return an Agent based on a gene name.

Reactome Prior (emmaa.priors.reactome_prior)

emmaa.priors.reactome_prior.find_drugs_for_genes(search_terms, drug_gene_stmts=None)[source]

Return list of drugs targeting at least one gene from a list of genes


search_terms (list of emmaa.priors.SearchTerm) – List of search terms for genes


drug_terms – List of search terms of drugs targeting at least one of the input genes

Return type

list of emmaa.priors.SearchTerm


Get all genes contained in a given pathway


reactome_id (str) – Reactome id for a pathway


genes – List of uniprot ids for all unique genes contained in input pathway

Return type

list of str


“Get all ids for reactom pathways containing some form of an entity


reactome_id (str) – Reactome id for a gene


pathway_ids – List of reactome ids for pathways containing the input gene

Return type

list of str


Return reactome prior based on a list of genes


gene_list (list of str) – List of HGNC symbols for genes


res – List of search terms corresponding to all genes found in any reactome pathway containing one of the genes in the input gene list

Return type

list of emmaa.priors.SearchTerm


Return the Reactome Stable IDs for a given Uniprot ID.


Get the Uniprot ID (referenceEntity) for a given Reactome Stable ID.

Querying Prior Statements (emmaa.priors.prior_stmts)

emmaa.priors.prior_stmts.get_stmts_for_gene(gene, max_stmts=100000)[source]

Return all existing Statements for a given gene from the DB.

  • gene (str) – The HGNC symbol of a gene to query.

  • max_stmts (int) – The maximum number of statements to return

Return type



A list of INDRA Statements in which the given gene is involved.

emmaa.priors.prior_stmts.get_stmts_for_gene_list(gene_list, other_entities)[source]

Return all Statements between genes in a given list.

  • gene_list (list[str]) – A list of HGNC symbols for genes to query.

  • other_entities (list[str]) – A list of other entities to keep as part of the set of Statements.


A list of INDRA Statements between the given list of genes and other entities specified.

Return type