Skip to content

Commit 2f4e959

Browse files
authored
Update README.md
1 parent a6a620d commit 2f4e959

File tree

1 file changed

+49
-1
lines changed

1 file changed

+49
-1
lines changed

README.md

+49-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ You can download the following table to see the various parameters for your use
2525
| :------------: | :---------------: | :-------------------: | :----------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
2626
| Ling-Coder-lite-base | 16.8B | 2.75B | 16K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-Coder-lite-base) <br>[🤖 ModelScope](https://modelscope.cn/models/inclusionAI/Ling-Coder-lite-base) |
2727
| Ling-Coder-lite | 16.8B | 2.75B | 16K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-Coder-lite) <br>[🤖 ModelScope](https://modelscope.cn/models/inclusionAI/Ling-Coder-lite) |
28+
| Ling-Coder-lite-GPTQ-Int8 | 16.8B | 2.75B | 16K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-Coder-lite-GPTQ-Int8) <br>[🤖 ModelScope](https://modelscope.cn/models/inclusionAI/Ling-Coder-lite-GPTQ-Int8) |
2829

2930
</div>
3031

@@ -42,7 +43,7 @@ You can download the following table to see the various parameters for your use
4243

4344
## Evaluation
4445

45-
Detailed evaluation results are reported in our [technical report](https://arxiv.org/abs/2503.17793).
46+
Detailed evaluation results are reported in our [technical report](https://arxiv.org/abs/2503.17793). For detailed evaluation code, please refer to the evaluation method of Ling-Coder-Lite in [CodeFuse-Evaluation](https://github.com/codefuse-ai/codefuse-evaluation).
4647

4748
## Quickstart
4849

@@ -149,6 +150,53 @@ vllm serve inclusionAI/Ling-lite \
149150

150151
For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
151152

153+
### vLLM GPTQ Int8
154+
155+
#### Environment Preparation
156+
157+
Requirement: `vllm==0.6.3.post1`.
158+
159+
Patch `ling_gptq.patch` onto vLLM by executing:
160+
```bash
161+
patch -p1 < ling_gptq.patch -d $(python -c "from importlib.util import find_spec; print(find_spec('vllm').submodule_search_locations[0])")
162+
```
163+
164+
#### Inference Example
165+
166+
```python
167+
from vllm import LLM
168+
from vllm.sampling_params import SamplingParams
169+
from transformers import AutoTokenizer
170+
171+
model_name = "inclusionAI/Ling-Coder-lite-GPTQ-Int8"
172+
173+
model = LLM(model_name, trust_remote_code=True, gpu_memory_utilization=0.80, max_model_len=4096)
174+
175+
tokenizer = AutoTokenizer.from_pretrained(
176+
model_name,
177+
trust_remote_code=True
178+
)
179+
180+
prompt = "Write a quick sort algorithm in python."
181+
messages = [
182+
{"role": "user", "content": prompt}
183+
]
184+
text = tokenizer.apply_chat_template(
185+
messages,
186+
tokenize=False,
187+
add_generation_prompt=True
188+
)
189+
190+
sample_params = SamplingParams(max_tokens=1024, ignore_eos=False)
191+
outputs = model.generate(text, sampling_params=sample_params, prompt_token_ids=None)
192+
193+
for output in outputs:
194+
generated_text = output.outputs[0].text
195+
print(generated_text)
196+
```
197+
198+
Note: No extra parameters required by this GPTQ Int8 quantized model for vLLM online serving.
199+
152200
## Finetuning
153201

154202
We recommend you to use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) to finetune Ling with SFT, DPO, etc.

0 commit comments

Comments
 (0)