@@ -25,6 +25,7 @@ You can download the following table to see the various parameters for your use
25
25
| :------------: | :---------------: | :-------------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
26
26
| Ling-Coder-lite-base | 16.8B | 2.75B | 16K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-Coder-lite-base ) <br >[ 🤖 ModelScope] ( https://modelscope.cn/models/inclusionAI/Ling-Coder-lite-base ) |
27
27
| Ling-Coder-lite | 16.8B | 2.75B | 16K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-Coder-lite ) <br >[ 🤖 ModelScope] ( https://modelscope.cn/models/inclusionAI/Ling-Coder-lite ) |
28
+ | Ling-Coder-lite-GPTQ-Int8 | 16.8B | 2.75B | 16K | [ 🤗 HuggingFace] ( https://huggingface.co/inclusionAI/Ling-Coder-lite-GPTQ-Int8 ) <br >[ 🤖 ModelScope] ( https://modelscope.cn/models/inclusionAI/Ling-Coder-lite-GPTQ-Int8 ) |
28
29
29
30
</div >
30
31
@@ -42,7 +43,7 @@ You can download the following table to see the various parameters for your use
42
43
43
44
## Evaluation
44
45
45
- Detailed evaluation results are reported in our [ technical report] ( https://arxiv.org/abs/2503.17793 ) .
46
+ Detailed evaluation results are reported in our [ technical report] ( https://arxiv.org/abs/2503.17793 ) . For detailed evaluation code, please refer to the evaluation method of Ling-Coder-Lite in [ CodeFuse-Evaluation ] ( https://github.com/codefuse-ai/codefuse-evaluation ) .
46
47
47
48
## Quickstart
48
49
@@ -149,6 +150,53 @@ vllm serve inclusionAI/Ling-lite \
149
150
150
151
For detailed guidance, please refer to the vLLM [ ` instructions ` ] ( https://docs.vllm.ai/en/latest/ ) .
151
152
153
+ ### vLLM GPTQ Int8
154
+
155
+ #### Environment Preparation
156
+
157
+ Requirement: ` vllm==0.6.3.post1 ` .
158
+
159
+ Patch ` ling_gptq.patch ` onto vLLM by executing:
160
+ ``` bash
161
+ patch -p1 < ling_gptq.patch -d $( python -c " from importlib.util import find_spec; print(find_spec('vllm').submodule_search_locations[0])" )
162
+ ```
163
+
164
+ #### Inference Example
165
+
166
+ ``` python
167
+ from vllm import LLM
168
+ from vllm.sampling_params import SamplingParams
169
+ from transformers import AutoTokenizer
170
+
171
+ model_name = " inclusionAI/Ling-Coder-lite-GPTQ-Int8"
172
+
173
+ model = LLM(model_name, trust_remote_code = True , gpu_memory_utilization = 0.80 , max_model_len = 4096 )
174
+
175
+ tokenizer = AutoTokenizer.from_pretrained(
176
+ model_name,
177
+ trust_remote_code = True
178
+ )
179
+
180
+ prompt = " Write a quick sort algorithm in python."
181
+ messages = [
182
+ {" role" : " user" , " content" : prompt}
183
+ ]
184
+ text = tokenizer.apply_chat_template(
185
+ messages,
186
+ tokenize = False ,
187
+ add_generation_prompt = True
188
+ )
189
+
190
+ sample_params = SamplingParams(max_tokens = 1024 , ignore_eos = False )
191
+ outputs = model.generate(text, sampling_params = sample_params, prompt_token_ids = None )
192
+
193
+ for output in outputs:
194
+ generated_text = output.outputs[0 ].text
195
+ print (generated_text)
196
+ ```
197
+
198
+ Note: No extra parameters required by this GPTQ Int8 quantized model for vLLM online serving.
199
+
152
200
## Finetuning
153
201
154
202
We recommend you to use [ Llama-Factory] ( https://github.com/hiyouga/LLaMA-Factory ) to finetune Ling with SFT, DPO, etc.
0 commit comments