Priors (emmaa.priors)

This module contains classes to generate prior networks.

class emmaa.priors.SearchTerm(type, name, db_refs, search_term)[source]

Bases: object

Represents a search term to be used in a model configuration.

Parameters:
  • type (str) – The type of search term, e.g. gene, bioprocess, other
  • name (str) – The name of the search term, is equivalent to an Agent name
  • db_refs (dict) – A dict of database references for the given term, is similar to an Agent db_refs dict
  • search_term (str) – The actual search term to us for searching PubMed
classmethod from_json(jd)[source]

Return a SearchTerm object from JSON.

to_json()[source]

Return search term as JSON.

emmaa.priors.get_drugs_for_gene(stmts, hgnc_id)[source]

Get list of drugs that target a gene

Parameters:
  • stmts (list of indra.statements.Statement) – List of INDRA statements with a drug as subject
  • hgnc_id (str) – HGNC id for a gene
Returns:

drugs_for_gene – List of search terms for drugs targeting the input gene

Return type:

list of emmaa.priors.SearchTerm

Literature Prior (emmaa.priors.literature_prior)

This module implements the LiteraturePrior class which automates some of the steps involved in starting a model around a set of literature searches. Example:

lp = LiteraturePrior('some_disease', 'Some Disease',
                     'This is a self-updating model of Some Disease',
                     search_strings=['some disease'],
                     assembly_config_template='nf')
estmts = lp.get_statements()
model = lp.make_model(estmts, upload_to_s3=True)
emmaa.priors.literature_prior.get_raw_statements_for_pmids(pmids, mode='all', batch_size=100)[source]

Return EmmaaStatements based on extractions from given PMIDs.

Parameters:
  • pmids (set or list of str) – A set of PMIDs to find raw INDRA Statements for in the INDRA DB.
  • mode ('all' or 'distilled') – The ‘distilled’ mode makes sure that the “best”, non-redundant set of raw statements are found across potentially redundant text contents and reader versions. The ‘all’ mode doesn’t do such distillation but is significantly faster.
  • batch_size (Optional[int]) – Determines how many PMIDs to fetch statements for in each iteration. Default: 100.
Returns:

A dict keyed by PMID with values INDRA Statements obtained from the given PMID.

Return type:

dict

emmaa.priors.literature_prior.make_search_terms(search_strings, mesh_ids)[source]

Return EMMAA SearchTerms based on search strings and MeSH IDs.

Parameters:
  • search_strings (list of str) – A list of search strings e.g., “diabetes” to find papers in the literature.
  • mesh_ids (list of str) – A list of MeSH IDs that are used to search the literature as headings associated with papers.
Returns:

A list of EMMAA SearchTerm objects constructed from the search strings and the MeSH IDs.

Return type:

list of emmmaa.prior.SearchTerm

TCGA Cancer Prior (emmaa.priors.cancer_prior)

class emmaa.priors.cancer_prior.TcgaCancerPrior(tcga_study_prefix, sif_prior, diffusion_service=None, mutation_cache=None)[source]

Bases: object

Prior network generation using TCGA mutations for a given cancer type.

This class implements building a prior network using a generic underlying prior, and TCGA data for a specific cancer type. Mutations for the given cancer type are extracted from TCGA studies and heat diffusion from the corresponding nodes in the prior is used to identify a set of relevant nodes.

static find_drugs_for_genes(node_list)[source]

Return list of drugs targeting gene nodes.

get_mutated_genes()[source]

Return dict of gene mutation frequencies based on TCGA studies.

get_relevant_nodes(pct_heat_threshold)[source]

Return a list of the relevant nodes in the prior.

Heat diffusion is applied to the prior network based on initial heat on nodes that are mutated according to patient statistics.

load_sif_prior(fname, e50=20)[source]

Return a Graph based on a SIF file describing a prior.

Parameters:
  • fname (str) – Path to the SIF file.
  • e50 (int) – Parameter for converting evidence counts into weights over the interval [0, 1) according to hyperbolic function weight = (count / (count + e50)).
make_prior(pct_heat_threshold=99)[source]

Run the prior node list generation and return relevant nodes.

static search_terms_from_nodes(node_list)[source]

Build a list of Pubmed search terms from the nodes returned by make_prior.

Gene List Prior (emmaa.priors.gene_list_prior)

class emmaa.priors.gene_list_prior.GeneListPrior(gene_list, name, human_readable_name)[source]

Bases: object

Class to manage the construction of a model from a list of genes.

Parameters:
  • gene_list (list[str]) – A list of HGNC gene symbols
  • name (str) – The name of the model (all lower case, no spaces or special characters)
  • human_readable_name (str) – The human readable name (display name) of the model
make_config()[source]

Generate a configuration based on attributes.

make_gene_statements()[source]

Generate Statements from the gene list.

make_model()[source]

Make an EmmaaModel and upload it along with the config to S3.

make_search_terms(drug_gene_stmts=None)[source]

Generate search terms from the gene list.

emmaa.priors.gene_list_prior.agent_from_gene_name(gene_name)[source]

Return an Agent based on a gene name.

Reactome Prior (emmaa.priors.reactome_prior)

emmaa.priors.reactome_prior.find_drugs_for_genes(search_terms, drug_gene_stmts=None)[source]

Return list of drugs targeting at least one gene from a list of genes

Parameters:search_terms (list of emmaa.priors.SearchTerm) – List of search terms for genes
Returns:drug_terms – List of search terms of drugs targeting at least one of the input genes
Return type:list of emmaa.priors.SearchTerm
emmaa.priors.reactome_prior.get_genes_contained_in_pathway[source]

Get all genes contained in a given pathway

Parameters:reactome_id (str) – Reactome id for a pathway
Returns:genes – List of uniprot ids for all unique genes contained in input pathway
Return type:list of str
emmaa.priors.reactome_prior.get_pathways_containing_gene[source]

“Get all ids for reactom pathways containing some form of an entity

Parameters:reactome_id (str) – Reactome id for a gene
Returns:pathway_ids – List of reactome ids for pathways containing the input gene
Return type:list of str
emmaa.priors.reactome_prior.make_prior_from_genes(gene_list)[source]

Return reactome prior based on a list of genes

Parameters:gene_list (list of str) – List of HGNC symbols for genes
Returns:res – List of search terms corresponding to all genes found in any reactome pathway containing one of the genes in the input gene list
Return type:list of emmaa.priors.SearchTerm
emmaa.priors.reactome_prior.rx_id_from_up_id[source]

Return the Reactome Stable IDs for a given Uniprot ID.

emmaa.priors.reactome_prior.up_id_from_rx_id[source]

Get the Uniprot ID (referenceEntity) for a given Reactome Stable ID.

Querying Prior Statements (emmaa.priors.prior_stmts)

emmaa.priors.prior_stmts.get_stmts_for_gene(gene)[source]

Return all existing Statements for a given gene from the DB.

Parameters:gene (str) – The HGNC symbol of a gene to query.
Returns:A list of INDRA Statements in which the given gene is involved.
Return type:list[indra.statements.Statement]
emmaa.priors.prior_stmts.get_stmts_for_gene_list(gene_list, other_entities)[source]

Return all Statements between genes in a given list.

Parameters:
  • gene_list (list[str]) – A list of HGNC symbols for genes to query.
  • other_entities (list[str]) – A list of other entities to keep as part of the set of Statements.
Returns:

A list of INDRA Statements between the given list of genes and other entities specified.

Return type:

list[indra.statements.Statement]