Skip to content

Pretrained vs Own-Trained F1 Score Discrepancy #24

@abdullahasif07

Description

@abdullahasif07

Description:
While running the ATLAS system, we observed a significant discrepancy in F1 scores between the pretrained model and our own trained version. The pretrained model closely matches the paper’s results (within 1%), but the own-trained results show a larger variation.

We find it acceptable if reproducibility differences are within ±2%, but these deviations exceed that margin.


Results Summary

Model Level TP FP TN FN Precision Recall F1 Score F1 % Diff
Paper M1 (Event Level) 8168.0000 3.0000 24304.0000 0.0000 0.9996 1.0000 0.9998
Own-Trained M1 (Event Level) 5299.0000 379.0000 243137.0000 2881.0000 0.9333 0.6478 0.7648 −23.5103
Pre-Trained M1 (Event Level) 8180.0000 1.0000 243494.0000 0.0000 0.9999 1.0000 0.9999 0.0123
Model Level TP FP TN FN Precision Recall F1 Score F1 % Diff
Paper M3 (Entity Level) 35.0000 1.0000 24423.0000 1.0000 0.9722 0.9722 0.9722
Own-Trained M3 (Entity Level) 18.0000 15.0000 1263.0000 6.0000 0.5455 0.7500 0.6316 −35.0376
Pre-Trained M3 (Entity Level) 24.0000 1.0000 1308.0000 0.0000 0.9600 1.0000 0.9796 0.7580

Request:
Any idea and guidance on why this discrepancy in F1 scores between the pretrained and own-trained models could be happening?


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions