enriching the reported results

The current outputted results can be improved by addressing the following issues: 


1.  Statistical Test Consistency in Part 2 Results
   -  Currently, both paired t-tests and Wilcoxon tests are used based on normality checks.
    - Proposal: Use Wilcoxon test consistently across all arms for better methodological consistency.

2.  Non-Inferiority Threshold Clarification
    -  The non-inferiority p-value and statistical test rely on a threshold that is currently unspecified.
    - Action Needed: Define and document this threshold—suggested value is 10%

3.  Missing Weighted F-Score Table in Part 2 Results
   - The report lacks weighted F-score results per phenotype and per AI arm (the results are across all phenotypes)
   - Action Needed: Add a table showing weighted F-scores for each phenotype and AI arm.

4.  Include Unweighted F-Score Results
        - Unweighted F-score results are not currently included.
        - Action Needed: Add these to the report for completeness.

5.  Agreement Metrics in Part 1 Results**
     Agreement is currently pooled across all AI arms.
    Action Needed::
        - Report agreement (overlap bars) separately for each AI arm and each disease.
        -  Weight these bars by concept prevalence.

6.  Precision and Recall Reporting**
    - Only F-score is reported.
    - Action Needed : Report  precision and recall  separately for each AI arm and each disease.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enriching the reported results #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

enriching the reported results #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions