Skip to content

Commit e627edf

Browse files
authored
Add: documentation for enhanced save_pretrained parameters (#1377)
This PR adds comprehensive documentation for the compression parameters available in the enhanced `save_pretrained` method. These parameters are critical for users working with model compression but were previously undocumented. ## Changes - Adds a new `docs/save_pretrained.md` file explaining: - How the enhanced `save_pretrained` method works - Detailed descriptions of all compression parameters - Code examples showing common usage patterns - Notes on compatibility with loading compressed models ## Benefits - **Better User Experience:** Users can clearly understand all available options - **Improved Onboarding:** New users can quickly learn how to save compressed models - **Comprehensive Examples:** Shows different approaches for saving models with compression This documentation supports [ticket](https://issues.redhat.com/browse/INFERENG-578) and will help users leverage the full capabilities of the compression functionality in the save process. --------- Signed-off-by: Rahul Tuli <[email protected]>
1 parent 998be99 commit e627edf

File tree

1 file changed

+116
-0
lines changed

1 file changed

+116
-0
lines changed

docs/save_pretrained.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Enhanced `save_pretrained` Arguments
2+
3+
The `llmcompressor` library extends Hugging Face's `save_pretrained` method with additional arguments to support model compression functionality. This document explains these extra arguments and how to use them effectively.
4+
5+
## How It Works
6+
7+
When you import `llmcompressor`, it automatically wraps the model's original `save_pretrained` method with an enhanced version that supports compression. This happens in two ways:
8+
9+
1. **Direct modification**: When you call `modify_save_pretrained(model)` directly
10+
2. **Automatic wrapping**: When you call `oneshot(...)`, which wraps `save_pretrained` under the hood
11+
12+
This means that after applying compression with `oneshot`, your model's `save_pretrained` method is already enhanced with compression capabilities, and you can use the additional arguments described below.
13+
14+
## Additional Arguments
15+
16+
When saving your compressed models, you can use the following extra arguments with the `save_pretrained` method:
17+
18+
| Parameter | Type | Default | Description |
19+
|-----------|------|---------|-------------|
20+
| `sparsity_config` | `Optional[SparsityCompressionConfig]` | `None` | Optional configuration for sparsity compression. This should be provided if there's existing sparsity in the model. If None and `skip_sparsity_compression_stats` is False, configuration will be automatically inferred from the model. |
21+
| `quantization_format` | `Optional[str]` | `None` | Optional format string for quantization. If not provided, it will be inferred from the model. |
22+
| `save_compressed` | `bool` | `True` | Controls whether to save the model in a compressed format. Set to `False` to save in the original dense format. |
23+
| `skip_sparsity_compression_stats` | `bool` | `True` | Controls whether to skip calculating sparsity statistics (e.g., global sparsity and structure) when saving the model. Set to `False` to include these statistics. If you are not providing a `sparsity_config`, you should set this to `False` to automatically generate the config for you. |
24+
| `disable_sparse_compression` | `bool` | `False` | When set to `True`, skips any sparse compression during save, even if the model has been previously compressed. |
25+
26+
## Workflow for Models with Existing Sparsity
27+
28+
When working with models that already have sparsity:
29+
30+
1. If you know the sparsity configuration, provide it directly via `sparsity_config`
31+
2. If you don't know the sparsity configuration, set `skip_sparsity_compression_stats` to `False` to automatically infer it from the model
32+
33+
This workflow ensures that the correct sparsity configuration is either provided or generated when saving models with existing sparsity.
34+
35+
## Examples
36+
37+
### Applying Compression with oneshot
38+
39+
The simplest approach is to use `oneshot`, which handles both compression and wrapping `save_pretrained`:
40+
41+
```python
42+
from transformers import AutoModelForCausalLM, AutoTokenizer
43+
from llmcompressor import oneshot
44+
from llmcompressor.modifiers.quantization import GPTQModifier
45+
46+
# Load model
47+
model = AutoModelForCausalLM.from_pretrained("your-model")
48+
tokenizer = AutoTokenizer.from_pretrained("your-model")
49+
50+
# Apply compression - this also wraps save_pretrained
51+
oneshot(
52+
model=model,
53+
recipe=[GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"])],
54+
# Other oneshot parameters...
55+
)
56+
57+
# Now you can use the enhanced save_pretrained
58+
SAVE_DIR = "your-model-W8A8-compressed"
59+
model.save_pretrained(
60+
SAVE_DIR,
61+
save_compressed=True # Use the enhanced functionality
62+
)
63+
tokenizer.save_pretrained(SAVE_DIR)
64+
```
65+
66+
### Manual Approach (Without oneshot)
67+
68+
If you need more control, you can wrap `save_pretrained` manually:
69+
70+
```python
71+
from transformers import AutoModelForCausalLM
72+
from llmcompressor.transformers.sparsification import modify_save_pretrained
73+
74+
# Load model
75+
model = AutoModelForCausalLM.from_pretrained("your-model")
76+
77+
# Manually wrap save_pretrained
78+
modify_save_pretrained(model)
79+
80+
# Now you can use the enhanced save_pretrained
81+
model.save_pretrained(
82+
"your-model-path",
83+
save_compressed=True,
84+
skip_sparsity_compression_stats=False # To automatically infer sparsity config
85+
)
86+
```
87+
88+
### Saving with Custom Sparsity Configuration
89+
90+
```python
91+
from compressed_tensors.sparsification import SparsityCompressionConfig
92+
93+
# Create custom sparsity config
94+
custom_config = SparsityCompressionConfig(
95+
format="2:4",
96+
block_size=16
97+
)
98+
99+
# Save with custom config
100+
model.save_pretrained(
101+
"your-model-custom-sparse",
102+
sparsity_config=custom_config,
103+
)
104+
```
105+
106+
## Notes
107+
108+
- When loading compressed models with `from_pretrained`, the compression format is automatically detected.
109+
- To use compressed models with vLLM, simply load them as you would any model:
110+
```python
111+
from vllm import LLM
112+
model = LLM("./your-model-compressed")
113+
```
114+
- Compression configurations are saved in the model's config file and are automatically applied when loading.
115+
116+
For more information about compression algorithms and formats, please refer to the documentation and examples in the llmcompressor repository.

0 commit comments

Comments
 (0)