Skip to content

lianshan01/gsm8k-eval-batch-v1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gsm8k-eval-batch-v1

reference: https://github.com/tianlwang/eval_gsm8k. This is an implementation of batch evaluation for GSM8K.

few-shot

8-shot

The 8-shot prompt is from the lm-evaluation-harness gsm8k-cot

python eval_gsm8k.py --model <model_name>

Model Accuracy Harness Accuracy
Mistral-7B-v0.1
Llama-3-8b-hf 0.42

8-shot maj1@8

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.2 --n_votes 8

Model Accuracy Harness Accuracy
Mistral-7B-v0.1

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.4 --n_votes 8

Model Accuracy
Mistral-7B-v0.1

zero-shot

cot zero-shot

use the Chain of Thought prompt "Let's think step by step." before answering the question.

python eval_gsm8k.py --model <model_name> --cot

Model Accuracy Harness Accuracy
Mistral-7B-v0.1

zero-shot

python eval_gsm8k.py --model <model_name> --zero-shot

Model Accuracy
Mistral-7B-v0.1

About

reference: https://github.com/tianlwang/eval_gsm8k. This is an implementation of batch evaluation for GSM8K.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages