Analyze model test results (emmaa.analyze_tests_results)

class emmaa.analyze_tests_results.ModelRound(statements, date_str, paper_ids=None, paper_id_type='TRID', emmaa_statements=None)[source]

Bases: emmaa.analyze_tests_results.Round

Analyzes the results of one model update round.

Parameters:
  • statements (list[indra.statements.Statement]) – A list of INDRA Statements used to assemble a model.
  • date_str (str) – Time when ModelManager responsible for this round was created.
  • paper_ids (list(str)) – A list of paper IDs used to get raw statements for this round.
  • paper_id_type (str) – Type of paper ID used.
stmts_by_papers

A dictionary mapping the paper IDs to sets of hashes of assembled statements with evidences retrieved from these papers.

Type:dict
get_agent_distribution()[source]

Return a sorted list of tuples containing an agent name and a number of times this agent occured in statements of a model.

get_all_raw_paper_ids()[source]

Return all paper IDs used in this round.

get_assembled_stmts_by_paper(id_type='TRID')[source]

Get a mapping of paper IDs (TRID or PII) to assembled statements.

get_english_statements_by_hash()[source]

Return a dictionary mapping a statement and its English description.

get_number_raw_papers()[source]

Return a total number of papers in this round.

Return a dictionary mapping paper IDs to their titles.

get_papers_distribution()[source]

Return a sorted list of tuples containing a paper ID and a number of unique statements extracted from that paper.

get_statement_types()[source]

Return a sorted list of tuples containing a statement type and a number of times a statement of this type occured in a model.

get_statements_by_evidence()[source]

Return a sorted list of tuples containing a statement hash and a number of times this statement occured in a model.

get_stmt_hashes()[source]

Return a list of hashes for all statements in a model.

get_total_statements()[source]

Return a total number of statements in a model.

class emmaa.analyze_tests_results.ModelStatsGenerator(model_name, latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]

Bases: emmaa.analyze_tests_results.StatsGenerator

Generates statistic for a given model update round.

Parameters:
  • model_name (str) – A name of a model the tests were run against.
  • latest_round (emmaa.analyze_tests_results.ModelRound) – An instance of a ModelRound to generate statistics for. If not given, will be generated by loading model data from s3.
  • previous_round (emmaa.analyze_tests_results.ModelRound) – A different instance of a ModelRound to find delta between two rounds. If not given, will be generated by loading model data from s3.
  • previous_json_stats (list[dict]) – A JSON-formatted dictionary containing model statistics for previous update round.
json_stats

A JSON-formatted dictionary containing model statistics.

Type:dict
make_changes_over_time()[source]

Add changes to model over time to json_stats.

make_curation_summary()[source]

Add latest curation summary to json_stats.

make_model_delta()[source]

Add model delta between two latest model states to json_stats.

make_model_summary()[source]

Add latest model state summary to json_stats.

make_paper_delta()[source]

Add paper delta between two latest model states to json_stats.

make_paper_summary()[source]

Add latest paper summary to json_stats.

make_stats()[source]

Check if two latest model rounds were found and add statistics to json_stats dictionary. If both latest round and previous round were passed or found on s3, a dictionary will have three key-value pairs: model_summary, model_delta, and changes_over_time.

class emmaa.analyze_tests_results.Round(date_str)[source]

Bases: object

Parent class for classes analyzing one round of something (model or tests).

Parameters:date_str (str) – Time when ModelManager responsible for this round was created.
function_mapping

A dictionary of strings mapping a type of content to a tuple of functions necessary to find delta for this type of content. First function in a tuple gets a list of all hashes for a given content type, while the second returns an English description of a given content type for a single hash.

Type:dict
find_delta_hashes(other_round, content_type, **kwargs)[source]

Return a dictionary of changed hashes of a given content type. This method makes use of self.function_mapping dictionary.

Parameters:
  • other_round (emmaa.analyze_tests_results.TestRound) – A different instance of a TestRound
  • content_type (str) – A type of the content to find delta. Accepted values: - statements - applied_tests - passed_tests - paths
  • **kwargs (dict) – For some of content types, additional arguments must be provided sych as mc_type.
Returns:

hashes – A dictionary containing lists of added and removed hashes of a given content type between two test rounds.

Return type:

dict

class emmaa.analyze_tests_results.StatsGenerator(model_name, latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]

Bases: object

Parent class for classes generating statistic for a given round of tests or model update.

Parameters:
  • model_name (str) – A name of a model the tests were run against.
  • latest_round (ModelRound or TestRound or None) – An instance of a ModelRound or TestRound to generate statistics for. If not given, will be generated by loading json from s3.
  • previous_round (ModelRound or TestRound or None) – A different instance of a ModelRound or TestRound to find delta between two rounds. If not given, will be generated by loading json from s3.
  • previous_json_stats (dict) – A JSON-formatted dictionary containing model or test statistics for the previous round.
json_stats

A JSON-formatted dictionary containing model or test statistics.

Type:dict
make_changes_over_time()[source]

Add changes to model and tests over time to json_stats.

class emmaa.analyze_tests_results.TestRound(json_results, date_str)[source]

Bases: emmaa.analyze_tests_results.Round

Analyzes the results of one test round.

Parameters:
  • json_results (list[dict]) – A list of JSON formatted dictionaries to store information about the test results. The first dictionary contains information about the model. Each consecutive dictionary contains information about a single test applied to the model and test results.
  • date_str (str) – Time when ModelManager responsible for this round was created.
mc_types_results

A dictionary mapping a type of a ModelChecker to a list of test results generated by this ModelChecker

Type:dict
tests

A list of INDRA Statements used to make EMMAA tests.

Type:list[indra.statements.Statement]
english_test_results

A dictionary mapping a test hash and a list containing its English description, result in Pass/Fail/n_a form and either a path if it was found or a result code if it was not.

Type:dict
get_applied_test_hashes()[source]

Return a list of hashes for all applied tests.

get_number_passed_tests(mc_type='pysb')[source]

Return a number of all passed tests.

get_passed_test_hashes(mc_type='pysb')[source]

Return a list of hashes for passed tests.

get_total_applied_tests()[source]

Return a number of all applied tests.

passed_over_total(mc_type='pysb')[source]

Return a ratio of passed over total tests.

class emmaa.analyze_tests_results.TestStatsGenerator(model_name, test_corpus_str='large_corpus_tests', latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]

Bases: emmaa.analyze_tests_results.StatsGenerator

Generates statistic for a given test round.

Parameters:
  • model_name (str) – A name of a model the tests were run against.
  • test_corpus_str (str) – A name of a test corpus the model was tested against.
  • latest_round (emmaa.analyze_tests_results.TestRound) – An instance of a TestRound to generate statistics for. If not given, will be generated by loading test results from s3.
  • previous_round (emmaa.analyze_tests_results.TestRound) – A different instance of a TestRound to find delta between two rounds. If not given, will be generated by loading test results from s3.
  • previous_json_stats (list[dict]) – A JSON-formatted dictionary containing test statistics for previous test round.
json_stats

A JSON-formatted dictionary containing test statistics.

Type:dict
make_changes_over_time()[source]

Add changes to tests over time to json_stats.

make_stats()[source]

Check if two latest test rounds were found and add statistics to json_stats dictionary. If both latest round and previous round were passed or found on s3, a dictionary will have three key-value pairs: test_round_summary, tests_delta, and changes_over_time.

make_test_summary()[source]

Add latest test round summary to json_stats.

make_tests_delta()[source]

Add tests delta between two latest test rounds to json_stats.

emmaa.analyze_tests_results.generate_stats_on_s3(model_name, mode, test_corpus_str='large_corpus_tests', upload_stats=True, bucket='emmaa')[source]

Generate statistics for latest round of model update or tests.

Parameters:
  • model_name (str) – A name of EmmaaModel.
  • mode (str) – Type of stats to generate (model or tests)
  • test_corpus_str (str) – A name of a test corpus.
  • upload_stats (Optional[bool]) – Whether to upload latest statistics about model and a test. Default: True