Skip to content

enriching the reported results #10

@azzashoaibi

Description

@azzashoaibi

The current outputted results can be improved by addressing the following issues:

  1. Statistical Test Consistency in Part 2 Results
  • Currently, both paired t-tests and Wilcoxon tests are used based on normality checks.
    - Proposal: Use Wilcoxon test consistently across all arms for better methodological consistency.
  1. Non-Inferiority Threshold Clarification

    • The non-inferiority p-value and statistical test rely on a threshold that is currently unspecified.
    • Action Needed: Define and document this threshold—suggested value is 10%
  2. Missing Weighted F-Score Table in Part 2 Results

  • The report lacks weighted F-score results per phenotype and per AI arm (the results are across all phenotypes)
  • Action Needed: Add a table showing weighted F-scores for each phenotype and AI arm.
  1. Include Unweighted F-Score Results
    - Unweighted F-score results are not currently included.
    - Action Needed: Add these to the report for completeness.

  2. Agreement Metrics in Part 1 Results**
    Agreement is currently pooled across all AI arms.
    Action Needed::
    - Report agreement (overlap bars) separately for each AI arm and each disease.
    - Weight these bars by concept prevalence.

  3. Precision and Recall Reporting**

    • Only F-score is reported.
    • Action Needed : Report precision and recall separately for each AI arm and each disease.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions