Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions QEfficient/finetune/experimental/configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause
---
***

# Training Configuration with LoRA Finetuning

## Overview

This configuration file defines the setup for fine-tuning a Hugging Face causal language model using **LoRA (Low-Rank Adaptation)** and **PEFT (Parameter-Efficient Fine-Tuning)** techniques. It also includes dataset, training, optimizer, and scheduler settings.

***
### 1. Model Configuration

Model-related parameters for loading and fine-tuning.

* **model\_type**: `hf` → Type of model (`hf` for Hugging Face, `custom` for custom models).
* **auto\_class\_name**: `AutoModelForCausalLM` → AutoClass used to load the model.
* **model\_name**: `HuggingFaceTB/SmolLM-135M` → Pretrained model to fine-tune.
* **load\_in\_4bit**: `false` → If `true`, loads model in 4-bit quantization for memory efficiency.
* **use\_peft**: `true` → Enables PEFT for parameter-efficient fine-tuning.
* **peft\_config**: Defines LoRA parameters when `use_peft` is `true`:
* `lora_r`: Rank for LoRA adapters.
* `lora_alpha`: Scaling factor for LoRA updates.
* `lora_dropout`: Dropout applied to LoRA layers.
* `target_modules`: Modules to apply LoRA (e.g., `q_proj`, `v_proj`).
* `bias`: Bias handling (`none`, `all`, `lora_only`).
* `task_type`: `CAUSAL_LM` → Task type (e.g., `CAUSAL_LM`, `SEQ_2_SEQ_LM`).
* `peft_type`: `LORA` → Fine-tuning method (e.g., `LORA`, `IA3`).

***


### 2. Dataset Configuration

This section defines parameters for dataset handling during fine-tuning with Hugging Face models. It covers dataset type, splits, prompt formatting, and DataLoader settings.

* **tokenizer\_name**: Matches model name.
* **dataset\_type**: `seq_completion` → Used for sequence continuation tasks, where the model predicts the next tokens given an input text (e.g., summarization, text generation).
* **dataset\_name**: Dataset name for training.
* **train\_split/test\_split**: Defines splits.
* **split\_ratio**: For spliting the train/test dataset, only if train split is provided.
* **prompt\_func**: Python function to format prompts.
* **completion\_template**: `{output}` → string pattern that tells the fine-tuning pipeline which part of the dataset should be treated as the target output (completion) for the model to learn.

### Example Dataset Configs

### **1. Alpaca (yahma/alpaca-cleaned)**

```yaml
dataset:
tokenizer_name: "meta-llama/Llama-3.2-1B"
dataset_type: "seq_completion"
dataset_name: "yahma/alpaca-cleaned"
train_split: "train"
test_split: "test"
max_seq_length: 512
prompt_func: "alpaca_func:create_alpaca_prompt"
completion_template: "{output}"

```

***

### **2. Samsum (knkarthick/samsum)**

```yaml
dataset:
tokenizer_name: "meta-llama/Llama-3.2-1B"
dataset_type: "seq_completion"
dataset_name: "knkarthick/samsum"
train_split: "train"
test_split: "test"
prompt_func: "samsum_func:create_samsum_prompt"
completion_template: "{summary}"

```

***
### **3. gsm8k (openai/gsm8k)**

```yaml
dataset:
tokenizer_name: "meta-llama/Llama-3.2-1B"
dataset_type: "seq_completion"
dataset_name: "openai/gsm8k"
train_split: "train"
test_split: "test"
prompt_func: "gsm8k_func:create_gsm8k_prompt"
completion_template: "{answer}"

```

***

***
### **4. grammar (grammar_dataset)**

```yaml
dataset:
tokenizer_name: "meta-llama/Llama-3.2-1B"
dataset_type: "seq_completion"
dataset_name: "grammar"
train_split: "train"
split_ratio: 0.8
prompt_func: "gsm8k_func:create_grammar_prompt"
completion_template: "{target}"
```

***
### Prompt Function Examples

```python
# Alpaca
def create_alpaca_prompt(example):
return f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n Response:\n"

# Samsum
def create_samsum_prompt(example):
return f"Summarize the following conversation:\n\n{example['dialogue']}\n\nSummary:\n"

#gsm8K
def create_gsm8k_prompt(example):
return f"Solve the following math problem step by step:\n\n{example['question']}\n\nAnswer:\n"

#grammar
def create_grammar_prompt(example):
return f"Correct the grammar in the following sentence:\n\n{example['input']}\n\nCorrected:\n"

```


***

### 3. Training Configuration

This section defines core parameters for fine-tuning and evaluation.

* **type**: `sft` → Specifies training type; `sft` means Supervised Fine-Tuning.
* **output\_dir**: Directory where model checkpoints and logs are saved.
* **do\_eval**: Enables evaluation during training.
* **eval\_strategy**: `epoch` → When to run evaluation (e.g., per epoch or steps).
* **gradient\_accumulation\_steps**: Accumulate gradients over multiple steps to simulate larger batch size.
* **dtype**: `fp16` → Mixed precision for faster training and reduced memory usage.
* **gradient\_checkpointing**: Saves memory by recomputing activations during backward pass (slower but memory-efficient).
* **torch\_compile**: Wraps your model with torch.compile() (PyTorch 2.0+) to fuse ops, reduce Python overhead, and generate optimized kernels—often yielding speed-ups without code changes.
* **Optional distributed configs**: FSDP, DeepSpeed, or DDP for multi-QAIC or large-scale training.
* **resume_from_checkpoint**: Path to a checkpoint to resume training from.
* **disable_tqdm**: False by default; set to True to disable progress bar (if running in Notebook).

***

### 4. Optimizer & Scheduler

* **optimizer**: `adamw` – Optimizer for weight-decoupled regularization; options: `adamw`, `adam`, `sgd`.
* **lr**: Initial learning rate (e.g., `5e-5` for fine-tuning).
* **weight\_decay**: Regularization strength (commonly `0.01`).

* **scheduler**: `cosine` – Learning rate decay strategy; options: `linear`, `cosine`, `cosine_with_restarts`, `polynomial`, `constant`, `constant_with_warmup`, `inverse_sqrt`.
* **warmup\_steps**: Number of steps or ratio (e.g., `100` steps or `0.05` for 5% of total steps).
* Stabilizes early training and improves convergence.

***

### 5. Callbacks

Callbacks allow custom actions during training, such as logging, early stopping, or hardware profiling.

* **early\_stopping**: Stops training if no improvement in a monitored metric for a defined patience period.
* **tensorboard**: Enables logging of metrics and losses to TensorBoard for visualization.
* **QAICProfilerCallback**: Profiles QAIC devices over a specified training step range to monitor performance and resource usage.
* **QAICOpByOpVerifierCallback**: Verifies QAIC operations step-by-step during a specified training range for correctness and debugging.

***
104 changes: 104 additions & 0 deletions QEfficient/finetune/experimental/configs/default_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
# SPDX-License-Identifier: BSD-3-Clause
#
# -----------------------------------------------------------------------------

# Model configuration
model:
model_type: "hf" # Hugging Face model
auto_class_name: "AutoModelForCausalLM"
model_name: "HuggingFaceTB/SmolLM-135M" # Pretrained model name
use_peft: true
peft_config:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1
target_modules: ["q_proj", "v_proj"]
bias: "none" # Options: none, all, lora_only
task_type: "CAUSAL_LM" # Options: CAUSAL_LM, SEQ_2_SEQ_LM, etc.
peft_type: "LORA" # Options: LORA, IA3, etc.

# Dataset configuration
dataset:
tokenizer_name: "HuggingFaceTB/SmolLM-135M"
dataset_type: "seq_completion"
# dataset_name: "Arthur-LAGACHERIE/very-smollm-corpus-0.5M"
dataset_name: "knkarthick/samsum"
train_split: "train"
max_seq_length: 512
split_ratio: 0.8 # Ratio for train/test split, used when only train_split is provided
test_split: "test"
group_by_length: True
num_workers: 4
pin_memory: True
persistent_workers: True
prefetch_factor: 1
drop_last: False

# Training configuration
training:
type: "sft"
output_dir: "./training_results"
eval_strategy: "epoch"
# eval_steps: 100 # If 'eval_strategy' is 'steps' then it will be used.
gradient_accumulation_steps: 1
num_train_epochs: 1
max_steps: -1
log_level: "info"
log_on_each_node: True
logging_strategy: "steps"
logging_steps: 10
save_strategy: "epoch"
# save_steps: 100 # If 'save_strategy' is 'steps' then it will be used.
save_total_limit: 5
metric_for_best_model: "eval_loss"
dtype: "fp16"

# Uncomment if running in Notebook
# disable_tqdm: True

# Uncomment below fsdp block to enable FSDP training
# fsdp: "full_shard"
# fsdp_config: "./configs/accelerate/fsdp_config.yaml"
# fsdp_config: "./configs/accelerate/fsdp_tp_parallelism_config.yaml"

# Uncomment below deepspeed block to enable DeepSpeed training
# deepspeed_config: "./configs/accelerate/deepSpeed_config.yaml"

# Uncomment below DDP block to enable DDP training and configure DDP params
# ddp_config:
# ddp_backend: "qccl"
# ddp_find_unused_parameters: False
# ddp_bucket_cap_mb: 25
# ddp_broadcast_buffers: null
# ddp_timeout: 1800

# Uncomment and populate to resume training
# resume_from_checkpoint: "./abc"
# restore_callback_states_from_checkpoint: True

gradient_checkpointing: False
gradient_checkpointing_kwargs:
preserve_rng_state : True
use_reenrant: False

torch_compile: True

# Optimizer configuration
optimizers:
optimizer_name: "adamw"
lr: 5e-5
weight_decay: 0.01

scheduler:
scheduler_name: "cosine"
warmup_steps: 100 # warmup_steps or warmup_ratio
warmup_ratio: 0.1

callbacks:
early_stopping:
early_stopping_patience: 3
early_stopping_threshold: 0.001
tensorboard:
Empty file.
Loading