Analyze model test results (`emmaa.analyze_tests_results`)¶

class emmaa.analyze_tests_results.ModelRound(statements, date_str, paper_ids=None, paper_id_type='TRID', emmaa_statements=None)[source]¶

Bases: Round

Analyzes the results of one model update round.

Parameters

statements (list[indra.statements.Statement]) – A list of INDRA Statements used to assemble a model.
date_str (str) – Time when ModelManager responsible for this round was created.
paper_ids (list(str)) – A list of paper IDs used to get raw statements for this round.
paper_id_type (str) – Type of paper ID used.

stmts_by_papers¶

A dictionary mapping the paper IDs to sets of hashes of assembled statements with evidences retrieved from these papers.

Type: dict

get_agent_distribution()[source]¶: Return a sorted list of tuples containing an agent name and a number of times this agent occured in statements of a model.

get_all_raw_paper_ids()[source]¶: Return all paper IDs used in this round.

get_assembled_stmts_by_paper(id_type='TRID')[source]¶: Get a mapping of paper IDs (TRID or PII) to assembled statements.

get_english_statements_by_hash()[source]¶: Return a dictionary mapping a statement and its English description.

get_number_raw_papers()[source]¶: Return a total number of papers in this round.

get_paper_titles_and_links(trids)[source]¶: Return a dictionary mapping paper IDs to their titles.

get_papers_distribution()[source]¶: Return a sorted list of tuples containing a paper ID and a number of unique statements extracted from that paper.

get_statement_types()[source]¶: Return a sorted list of tuples containing a statement type and a number of times a statement of this type occured in a model.

get_statements_by_evidence()[source]¶: Return a sorted list of tuples containing a statement hash and a number of times this statement occured in a model.

get_stmt_hashes()[source]¶: Return a list of hashes for all statements in a model.

get_total_statements()[source]¶: Return a total number of statements in a model.

class emmaa.analyze_tests_results.ModelStatsGenerator(model_name, latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]¶

Bases: StatsGenerator

Generates statistic for a given model update round.

Parameters

model_name (str) – A name of a model the tests were run against.
latest_round (emmaa.analyze_tests_results.ModelRound) – An instance of a ModelRound to generate statistics for. If not given, will be generated by loading model data from s3.
previous_round (emmaa.analyze_tests_results.ModelRound) – A different instance of a ModelRound to find delta between two rounds. If not given, will be generated by loading model data from s3.
previous_json_stats (list[dict]) – A JSON-formatted dictionary containing model statistics for previous update round.

json_stats¶

A JSON-formatted dictionary containing model statistics.

Type: dict

make_changes_over_time()[source]¶: Add changes to model over time to json_stats.

make_curation_summary()[source]¶: Add latest curation summary to json_stats.

make_model_delta()[source]¶: Add model delta between two latest model states to json_stats.

make_model_summary()[source]¶: Add latest model state summary to json_stats.

make_paper_delta()[source]¶: Add paper delta between two latest model states to json_stats.

make_paper_summary()[source]¶: Add latest paper summary to json_stats.

make_stats()[source]¶: Check if two latest model rounds were found and add statistics to json_stats dictionary. If both latest round and previous round were passed or found on s3, a dictionary will have three key-value pairs: model_summary, model_delta, and changes_over_time.

class emmaa.analyze_tests_results.Round(date_str)[source]¶

Bases: object

Parent class for classes analyzing one round of something (model or tests).

Parameters: date_str (str) – Time when ModelManager responsible for this round was created.

function_mapping¶

A dictionary of strings mapping a type of content to a tuple of functions necessary to find delta for this type of content. First function in a tuple gets a list of all hashes for a given content type, while the second returns an English description of a given content type for a single hash.

Type: dict

find_delta_hashes(other_round, content_type, **kwargs)[source]¶

Return a dictionary of changed hashes of a given content type. This method makes use of self.function_mapping dictionary.

Parameters

other_round (emmaa.analyze_tests_results.TestRound) – A different instance of a TestRound
content_type (str) – A type of the content to find delta. Accepted values: - statements - applied_tests - passed_tests - paths
**kwargs (dict) – For some of content types, additional arguments must be provided sych as mc_type.

Returns

hashes – A dictionary containing lists of added and removed hashes of a given content type between two test rounds.

Return type

dict

class emmaa.analyze_tests_results.StatsGenerator(model_name, latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]¶

Bases: object

Parent class for classes generating statistic for a given round of tests or model update.

Parameters

model_name (str) – A name of a model the tests were run against.
latest_round (ModelRound or TestRound or None) – An instance of a ModelRound or TestRound to generate statistics for. If not given, will be generated by loading json from s3.
previous_round (ModelRound or TestRound or None) – A different instance of a ModelRound or TestRound to find delta between two rounds. If not given, will be generated by loading json from s3.
previous_json_stats (dict) – A JSON-formatted dictionary containing model or test statistics for the previous round.

json_stats¶

A JSON-formatted dictionary containing model or test statistics.

Type: dict

make_changes_over_time()[source]¶: Add changes to model and tests over time to json_stats.

class emmaa.analyze_tests_results.TestRound(json_results, date_str)[source]¶

Bases: Round

Analyzes the results of one test round.

Parameters

json_results (list[dict]) – A list of JSON formatted dictionaries to store information about the test results. The first dictionary contains information about the model. Each consecutive dictionary contains information about a single test applied to the model and test results.
date_str (str) – Time when ModelManager responsible for this round was created.

mc_types_results¶

A dictionary mapping a type of a ModelChecker to a list of test results generated by this ModelChecker

Type: dict

tests¶

A list of INDRA Statements used to make EMMAA tests.

Type: list[indra.statements.Statement]

english_test_results¶

A dictionary mapping a test hash and a list containing its English description, result in Pass/Fail/n_a form and either a path if it was found or a result code if it was not.

Type: dict

get_applied_test_hashes()[source]¶: Return a list of hashes for all applied tests.

get_number_passed_tests(mc_type='pysb')[source]¶: Return a number of all passed tests.

get_passed_test_hashes(mc_type='pysb')[source]¶: Return a list of hashes for passed tests.

get_total_applied_tests()[source]¶: Return a number of all applied tests.

passed_over_total(mc_type='pysb')[source]¶: Return a ratio of passed over total tests.

class emmaa.analyze_tests_results.TestStatsGenerator(model_name, test_corpus_str='large_corpus_tests', latest_round=None, previous_round=None, previous_json_stats=None, bucket='emmaa')[source]¶

Bases: StatsGenerator

Generates statistic for a given test round.

Parameters

model_name (str) – A name of a model the tests were run against.
test_corpus_str (str) – A name of a test corpus the model was tested against.
latest_round (emmaa.analyze_tests_results.TestRound) – An instance of a TestRound to generate statistics for. If not given, will be generated by loading test results from s3.
previous_round (emmaa.analyze_tests_results.TestRound) – A different instance of a TestRound to find delta between two rounds. If not given, will be generated by loading test results from s3.
previous_json_stats (list[dict]) – A JSON-formatted dictionary containing test statistics for previous test round.

json_stats¶

A JSON-formatted dictionary containing test statistics.

Type: dict

make_changes_over_time()[source]¶: Add changes to tests over time to json_stats.

make_stats()[source]¶: Check if two latest test rounds were found and add statistics to json_stats dictionary. If both latest round and previous round were passed or found on s3, a dictionary will have three key-value pairs: test_round_summary, tests_delta, and changes_over_time.

make_test_summary()[source]¶: Add latest test round summary to json_stats.

make_tests_delta()[source]¶: Add tests delta between two latest test rounds to json_stats.

emmaa.analyze_tests_results.generate_stats_on_s3(model_name, mode, test_corpus_str='large_corpus_tests', upload_stats=True, bucket='emmaa')[source]¶

Generate statistics for latest round of model update or tests.

Parameters

model_name (str) – A name of EmmaaModel.
mode (str) – Type of stats to generate (model or tests)
test_corpus_str (str) – A name of a test corpus.
upload_stats (Optional[bool]) – Whether to upload latest statistics about model and a test. Default: True

Analyze model test results (emmaa.analyze_tests_results)¶

Analyze model test results (`emmaa.analyze_tests_results`)¶