Open
Description
I am trying to recreate the StarCoder2-Instruct-v0.1
model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct-v0.1
model on HF.
I actually see quite a bit of discrepancy between the two models' evaluations: humaneval
on your HF version is 7 points higher than on my reproduced model (both models were evaluated locally by me in the same environment).
MODEL_KEY=bigcode/starcoder2-15b
LR=1e-5
EPOCH=4
SEQ_LEN=1280
WARMUP_RATIO=0.05
OUTPUT_DIR=/path/to/output_model
DATASET_FILE=/path/to/50k-dataset.jsonl
accelerate launch -m star_align.train \
--model_key $MODEL_KEY \
--model_name_or_path $MODEL_KEY \
--use_flash_attention True \
--datafile_paths $DATASET_FILE \
--output_dir $OUTPUT_DIR \
--bf16 True \
--num_train_epochs $EPOCH \
--max_training_seq_length $SEQ_LEN \
--pad_to_max_length False \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 64 \
--group_by_length False \
--ddp_find_unused_parameters False \
--logging_steps 1 \
--log_level info \
--optim adafactor \
--max_grad_norm -1 \
--warmup_ratio $WARMUP_RATIO \
--learning_rate $LR \
--lr_scheduler_type linear
Are the parameters in the README correct for the released model? Are you adding anything in your accelerate
config? i.e. any model wrappers or something else?
For the data, I just ran:
>>> from datasets import load_dataset
>>> load_dataset("bigcode/self-oss-instruct-sc2-exec-filter-50k", split="train").to_json("/path/to/50k-dataset.jsonl", lines=True)
Do you have any ideas on how I can reproduce your model? Thanks!