gsm8k-eval-batch-v1

reference: https://github.com/tianlwang/eval_gsm8k. This is an implementation of batch evaluation for GSM8K.

few-shot

8-shot

The 8-shot prompt is from the lm-evaluation-harness gsm8k-cot

python eval_gsm8k.py --model <model_name>

Model	Accuracy	Harness Accuracy
Mistral-7B-v0.1
Llama-3-8b-hf	0.42

8-shot maj1@8

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.2 --n_votes 8

Model	Accuracy	Harness Accuracy
Mistral-7B-v0.1

python eval_gsm8k.py --model <model_name> --use_majority_vote --temp 0.4 --n_votes 8

Model	Accuracy
Mistral-7B-v0.1

zero-shot

cot zero-shot

use the Chain of Thought prompt "Let's think step by step." before answering the question.

python eval_gsm8k.py --model <model_name> --cot

Model	Accuracy	Harness Accuracy
Mistral-7B-v0.1

zero-shot

python eval_gsm8k.py --model <model_name> --zero-shot

Model	Accuracy
Mistral-7B-v0.1

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
eval_results		eval_results
LICENSE		LICENSE
README.md		README.md
eval_gsm8k.py		eval_gsm8k.py
gsm8k.txt		gsm8k.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gsm8k-eval-batch-v1

few-shot

8-shot

8-shot maj1@8

zero-shot

cot zero-shot

zero-shot

About

Releases

Packages

Languages

License

lianshan01/gsm8k-eval-batch-v1

Folders and files

Latest commit

History

Repository files navigation

gsm8k-eval-batch-v1

few-shot

8-shot

8-shot maj1@8

zero-shot

cot zero-shot

zero-shot

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages