Move neuron_parallel_compile outside of bash script #706

jgray-aws · 2024-09-26T19:28:05Z

Currently, the tutorial call neuron_parallel_compile inside of the bash script. Because neuron_parallel_compile is responsible for setting $NEURON_EXTRACT_GRAPHS_ONLY, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
    MAX_STEPS=$((LOGGING_STEPS + 5))
else
    MAX_STEPS=-1
fi

optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.mdx

Lines 215 to 262 in 3748a06

    
           ```bash 
        
           #!/bin/bash 
        
           set -ex 
        
           export NEURON_FUSE_SOFTMAX=1 
        
           export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 
        
           export MALLOC_ARENA_MAX=64 
        
           export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/" 
        
           PROCESSES_PER_NODE=8 
        
           NUM_EPOCHS=1 
        
           TP_DEGREE=2 
        
           PP_DEGREE=1 
        
           BS=1 
        
           GRADIENT_ACCUMULATION_STEPS=8 
        
           LOGGING_STEPS=1 
        
           MODEL_NAME="meta-llama/Meta-Llama-3-8B" 
        
           OUTPUT_DIR=output-$SLURM_JOB_ID 
        
           if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then 
        
               MAX_STEPS=$((LOGGING_STEPS + 5)) 
        
           else 
        
               MAX_STEPS=-1 
        
           fi 
        
           XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \ 
        
             --model_id $MODEL_NAME \ 
        
             --num_train_epochs $NUM_EPOCHS \ 
        
             --do_train \ 
        
             --learning_rate 5e-5 \ 
        
             --warmup_ratio 0.03 \ 
        
             --max_steps $MAX_STEPS \ 
        
             --per_device_train_batch_size $BS \ 
        
             --per_device_eval_batch_size $BS \ 
        
             --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ 
        
             --gradient_checkpointing true \ 
        
             --bf16 \ 
        
             --zero_1 false \ 
        
             --tensor_parallel_size $TP_DEGREE \ 
        
             --pipeline_parallel_size $PP_DEGREE \ 
        
             --logging_steps $LOGGING_STEPS \ 
        
             --save_total_limit 1 \ 
        
             --output_dir $OUTPUT_DIR \ 
        
             --lr_scheduler_type "constant" \ 
        
             --overwrite_output_dir 
        
           ```

We need to refactor the tutorial to call neuron_parallel_compile on the training script.

Example can be found here:

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html

The text was updated successfully, but these errors were encountered:

github-actions · 2025-02-11T08:05:06Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions bot added the Stale label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move neuron_parallel_compile outside of bash script #706

Move neuron_parallel_compile outside of bash script #706

jgray-aws commented Sep 26, 2024

github-actions bot commented Feb 11, 2025

Move neuron_parallel_compile outside of bash script #706

Move neuron_parallel_compile outside of bash script #706

Comments

jgray-aws commented Sep 26, 2024

github-actions bot commented Feb 11, 2025