Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P0] Can't reproduce commonsense reasoning numbers #141

Open
dyahadila opened this issue Oct 29, 2024 · 5 comments
Open

[P0] Can't reproduce commonsense reasoning numbers #141

dyahadila opened this issue Oct 29, 2024 · 5 comments
Assignees

Comments

@dyahadila
Copy link

Hello!

I am trying to reproduce commonsense reasoning numbers on LLaMA with this run script: https://github.com/stanfordnlp/pyreft/tree/main/examples/loreft#commonsense-reasoning-tasks

But my eval_results.json after training is showing all zeros:

    "eval/boolq": 0.0,
    "eval/piqa": 0.0,
    "eval/social_i_qa": 0.0,
    "eval/hellaswag": 0.0,
    "eval/winogrande": 0.0,
    "eval/ARC-Easy": 0.0,
    "eval/ARC-Challenge": 0.0,
    "eval/openbookqa": 0.0,
    "n_params": 2097408
}```

I am only reducing batch size from 16 to 8, but other than that everything is identical to the run script. I checked my training log and at some point the loss is nan.

Any advice is appreciated!
@frankaging frankaging changed the title Can't reproduce commonsense reasoning numbers [P0] Can't reproduce commonsense reasoning numbers Oct 29, 2024
@frankaging frankaging self-assigned this Oct 29, 2024
@frankaging
Copy link
Collaborator

Hey @dyahadila Thanks for reporting this issue.

Could you share your running command? and your loss curve? I did observe sometimes DiReFT might not converge due to numeric instability when overfitting to short-generation task like commensense reasoning. Could you also try another random seed, or reduce your learning rate a little and epoch to 3 to see if it solve the issue?

@frankaging
Copy link
Collaborator

frankaging commented Oct 29, 2024

BTW, we shared commonsense reasoning logs (loss curve, final eval, stats) through wandb (link). You might find it helpful. But since there are changes in this repo as well as in the huggingface transformer repo, this could be a new issue. If you could, also try to lower your transformers version to something like 4.44.0 which is closer to the version we used before.

@dyahadila
Copy link
Author

I was using LoREFT, and this is my command:

python train.py -task commonsense \
-data_dir dataset \
-model yahma/llama-7b-hf \
-seed 42 \
-l all -r 8 -p f7+l7 -e 6 -lr 9e-4 \
-type LoreftIntervention \
-gradient_accumulation_steps 2 \
-batch_size 8 \
-eval_batch_size 4 \
--dropout 0.00 \
--test_split test \
--use_normalized_template \
--share_weights \
--warmup_ratio 0.1 \
--greedy_decoding

identical to https://github.com/stanfordnlp/pyreft/tree/main/examples/loreft#commonsense-reasoning-tasks just with smaller batch size..

I am trying ep=6 lr = 1e-4
and ep=3 lr=5e-4 now! will let you know how these 2 turn out

@frankaging
Copy link
Collaborator

And did you install pyreft by using pip install git+https://github.com/stanfordnlp/pyreft.git?

@dyahadila
Copy link
Author

i used pip install pyreft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants