Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 1.33 KB

README.md

File metadata and controls

43 lines (31 loc) · 1.33 KB

gsm8k-eval-batch-v1

reference: https://github.com/tianlwang/eval_gsm8k. This is an implementation of batch evaluation for GSM8K.

few-shot

8-shot

The 8-shot prompt is from the lm-evaluation-harness gsm8k-cot

python eval_gsm8k.py --model <model_name>

Model Accuracy Harness Accuracy
Mistral-7B-v0.1
Llama-3-8b-hf 0.42

8-shot maj1@8

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.2 --n_votes 8

Model Accuracy Harness Accuracy
Mistral-7B-v0.1

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.4 --n_votes 8

Model Accuracy
Mistral-7B-v0.1

zero-shot

cot zero-shot

use the Chain of Thought prompt "Let's think step by step." before answering the question.

python eval_gsm8k.py --model <model_name> --cot

Model Accuracy Harness Accuracy
Mistral-7B-v0.1

zero-shot

python eval_gsm8k.py --model <model_name> --zero-shot

Model Accuracy
Mistral-7B-v0.1