Priors (emmaa.priors)

This module contains classes to generate prior networks.

class emmaa.priors.SearchTerm(type, name, db_refs, search_term)[source]

Bases: object

Represents a search term to be used in a model configuration.

  • type (str) – The type of search term, e.g. gene, bioprocess, other
  • name (str) – The name of the search term, is equivalent to an Agent name
  • db_refs (dict) – A dict of database references for the given term, is similar to an Agent db_refs dict
  • search_term (str) – The actual search term to us for searching PubMed
classmethod from_json(jd)[source]

Return a SearchTerm object from JSON.


Return search term as JSON.

emmaa.priors.get_drugs_for_gene(stmts, hgnc_id)[source]

Get list of drugs that target a gene

  • stmts (list of indra.statements.Statement) – List of INDRA statements with a drug as subject
  • hgnc_id (str) – HGNC id for a gene

drugs_for_gene – List of search terms for drugs targeting the input gene

Return type:

list of emmaa.priors.SearchTerm

Literature Prior (emmaa.priors.literature_prior)

This module implements the LiteraturePrior class which automates some of the steps involved in starting a model around a set of literature searches. Example:

lp = LiteraturePrior('some_disease', 'Some Disease',
                     'This is a self-updating model of Some Disease',
                     search_strings=['some disease'],
estmts = lp.get_statements()
model = lp.make_model(estmts, upload_to_s3=True)
emmaa.priors.literature_prior.get_raw_statements_for_pmids(pmids, mode='all', batch_size=100)[source]

Return EmmaaStatements based on extractions from given PMIDs.

  • pmids (set or list of str) – A set of PMIDs to find raw INDRA Statements for in the INDRA DB.
  • mode ('all' or 'distilled') – The ‘distilled’ mode makes sure that the “best”, non-redundant set of raw statements are found across potentially redundant text contents and reader versions. The ‘all’ mode doesn’t do such distillation but is significantly faster.
  • batch_size (Optional[int]) – Determines how many PMIDs to fetch statements for in each iteration. Default: 100.

A dict keyed by PMID with values INDRA Statements obtained from the given PMID.

Return type:


emmaa.priors.literature_prior.make_search_terms(search_strings, mesh_ids)[source]

Return EMMAA SearchTerms based on search strings and MeSH IDs.

  • search_strings (list of str) – A list of search strings e.g., “diabetes” to find papers in the literature.
  • mesh_ids (list of str) – A list of MeSH IDs that are used to search the literature as headings associated with papers.

A list of EMMAA SearchTerm objects constructed from the search strings and the MeSH IDs.

Return type:

list of emmmaa.prior.SearchTerm

TCGA Cancer Prior (emmaa.priors.cancer_prior)

class emmaa.priors.cancer_prior.TcgaCancerPrior(tcga_study_prefix, sif_prior, diffusion_service=None, mutation_cache=None)[source]

Bases: object

Prior network generation using TCGA mutations for a given cancer type.

This class implements building a prior network using a generic underlying prior, and TCGA data for a specific cancer type. Mutations for the given cancer type are extracted from TCGA studies and heat diffusion from the corresponding nodes in the prior is used to identify a set of relevant nodes.

static find_drugs_for_genes(node_list)[source]

Return list of drugs targeting gene nodes.


Return dict of gene mutation frequencies based on TCGA studies.


Return a list of the relevant nodes in the prior.

Heat diffusion is applied to the prior network based on initial heat on nodes that are mutated according to patient statistics.

load_sif_prior(fname, e50=20)[source]

Return a Graph based on a SIF file describing a prior.

  • fname (str) – Path to the SIF file.
  • e50 (int) – Parameter for converting evidence counts into weights over the interval [0, 1) according to hyperbolic function weight = (count / (count + e50)).

Run the prior node list generation and return relevant nodes.

static search_terms_from_nodes(node_list)[source]

Build a list of Pubmed search terms from the nodes returned by make_prior.

Gene List Prior (emmaa.priors.gene_list_prior)

class emmaa.priors.gene_list_prior.GeneListPrior(gene_list, name, human_readable_name)[source]

Bases: object

Class to manage the construction of a model from a list of genes.

  • gene_list (list[str]) – A list of HGNC gene symbols
  • name (str) – The name of the model (all lower case, no spaces or special characters)
  • human_readable_name (str) – The human readable name (display name) of the model

Generate a configuration based on attributes.


Generate Statements from the gene list.


Make an EmmaaModel and upload it along with the config to S3.


Generate search terms from the gene list.


Return an Agent based on a gene name.

Reactome Prior (emmaa.priors.reactome_prior)

emmaa.priors.reactome_prior.find_drugs_for_genes(search_terms, drug_gene_stmts=None)[source]

Return list of drugs targeting at least one gene from a list of genes

Parameters:search_terms (list of emmaa.priors.SearchTerm) – List of search terms for genes
Returns:drug_terms – List of search terms of drugs targeting at least one of the input genes
Return type:list of emmaa.priors.SearchTerm

Get all genes contained in a given pathway

Parameters:reactome_id (str) – Reactome id for a pathway
Returns:genes – List of uniprot ids for all unique genes contained in input pathway
Return type:list of str

“Get all ids for reactom pathways containing some form of an entity

Parameters:reactome_id (str) – Reactome id for a gene
Returns:pathway_ids – List of reactome ids for pathways containing the input gene
Return type:list of str

Return reactome prior based on a list of genes

Parameters:gene_list (list of str) – List of HGNC symbols for genes
Returns:res – List of search terms corresponding to all genes found in any reactome pathway containing one of the genes in the input gene list
Return type:list of emmaa.priors.SearchTerm

Return the Reactome Stable IDs for a given Uniprot ID.


Get the Uniprot ID (referenceEntity) for a given Reactome Stable ID.

Querying Prior Statements (emmaa.priors.prior_stmts)


Return all existing Statements for a given gene from the DB.

Parameters:gene (str) – The HGNC symbol of a gene to query.
Returns:A list of INDRA Statements in which the given gene is involved.
Return type:list[indra.statements.Statement]
emmaa.priors.prior_stmts.get_stmts_for_gene_list(gene_list, other_entities)[source]

Return all Statements between genes in a given list.

  • gene_list (list[str]) – A list of HGNC symbols for genes to query.
  • other_entities (list[str]) – A list of other entities to keep as part of the set of Statements.

A list of INDRA Statements between the given list of genes and other entities specified.

Return type: