-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
Description:
While running the ATLAS system, we observed a significant discrepancy in F1 scores between the pretrained model and our own trained version. The pretrained model closely matches the paper’s results (within 1%), but the own-trained results show a larger variation.
We find it acceptable if reproducibility differences are within ±2%, but these deviations exceed that margin.
Results Summary
| Model | Level | TP | FP | TN | FN | Precision | Recall | F1 Score | F1 % Diff |
|---|---|---|---|---|---|---|---|---|---|
| Paper | M1 (Event Level) | 8168.0000 | 3.0000 | 24304.0000 | 0.0000 | 0.9996 | 1.0000 | 0.9998 | — |
| Own-Trained | M1 (Event Level) | 5299.0000 | 379.0000 | 243137.0000 | 2881.0000 | 0.9333 | 0.6478 | 0.7648 | −23.5103 |
| Pre-Trained | M1 (Event Level) | 8180.0000 | 1.0000 | 243494.0000 | 0.0000 | 0.9999 | 1.0000 | 0.9999 | 0.0123 |
| Model | Level | TP | FP | TN | FN | Precision | Recall | F1 Score | F1 % Diff |
|---|---|---|---|---|---|---|---|---|---|
| Paper | M3 (Entity Level) | 35.0000 | 1.0000 | 24423.0000 | 1.0000 | 0.9722 | 0.9722 | 0.9722 | — |
| Own-Trained | M3 (Entity Level) | 18.0000 | 15.0000 | 1263.0000 | 6.0000 | 0.5455 | 0.7500 | 0.6316 | −35.0376 |
| Pre-Trained | M3 (Entity Level) | 24.0000 | 1.0000 | 1308.0000 | 0.0000 | 0.9600 | 1.0000 | 0.9796 | 0.7580 |
Request:
Any idea and guidance on why this discrepancy in F1 scores between the pretrained and own-trained models could be happening?
Metadata
Metadata
Assignees
Labels
No labels