Skip to content

Commit b138900

Browse files
committed
fix the spelling of TensorRT LLM
Signed-off-by: Faradawn Yang <[email protected]>
1 parent 4910491 commit b138900

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

docs/source/deployment-guide/quick-start-recipe-for-qwen3-next-on-trtllm.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# Quick Start Recipe for Qwen3 Next on TensorRT-LLM
1+
# Quick Start Recipe for Qwen3 Next on TensorRT LLM
22

33
## Introduction
44

5-
This deployment guide provides step-by-step instructions for running the Qwen3-Next model using TensorRT-LLM, optimized for NVIDIA GPUs. It covers the complete setup required; from accessing model weights and preparing the software environment to configuring TensorRT-LLM parameters, launching the server, and validating inference output.
5+
This deployment guide provides step-by-step instructions for running the Qwen3-Next model using TensorRT LLM, optimized for NVIDIA GPUs. It covers the complete setup required; from accessing model weights and preparing the software environment to configuring TensorRT LLM parameters, launching the server, and validating inference output.
66

7-
The guide is intended for developers and practitioners seeking high-throughput or low-latency inference using NVIDIA’s accelerated stack—starting with the PyTorch container from NGC, then installing TensorRT-LLM for model serving.
7+
The guide is intended for developers and practitioners seeking high-throughput or low-latency inference using NVIDIA’s accelerated stack—starting with the PyTorch container from NGC, then installing TensorRT LLM for model serving.
88

99
## Prerequisites
1010

@@ -22,7 +22,7 @@ The guide is intended for developers and practitioners seeking high-throughput o
2222

2323
### Run Docker Container
2424

25-
Run the docker container using the TensorRT-LLM NVIDIA NGC image.
25+
Run the docker container using the TensorRT LLM NVIDIA NGC image.
2626

2727
```shell
2828
docker run --rm -it \
@@ -42,11 +42,11 @@ Note:
4242
* The command also maps port `8000` from the container to your host so you can access the LLM API endpoint from your host
4343
* See the <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags> for all the available containers. The containers published in the main branch weekly have `rcN` suffix, while the monthly release with QA tests has no `rcN` suffix. Use the `rc` release to get the latest model and feature support.
4444

45-
If you want to use latest main branch, you can choose to build from source to install TensorRT-LLM, the steps refer to <https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html>.
45+
If you want to use latest main branch, you can choose to build from source to install TensorRT LLM, the steps refer to <https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html>.
4646

4747
### Creating the TRT-LLM Server config
4848

49-
We create a YAML configuration file `/tmp/config.yml` for the TensorRT-LLM Server and populate it with the following recommended performance settings. Note that we should set kv_cache_reuse to false.
49+
We create a YAML configuration file `/tmp/config.yml` for the TensorRT LLM Server and populate it with the following recommended performance settings. Note that we should set kv_cache_reuse to false.
5050

5151
```shell
5252
EXTRA_LLM_API_FILE=/tmp/config.yml
@@ -105,7 +105,7 @@ These options are used directly on the command line when you start the `trtllm-s
105105

106106
#### `--backend pytorch`
107107

108-
* **Description:** Tells TensorRT-LLM to use the **pytorch** backend.
108+
* **Description:** Tells TensorRT LLM to use the **pytorch** backend.
109109

110110
#### `--max_batch_size`
111111

@@ -121,7 +121,7 @@ These options are used directly on the command line when you start the `trtllm-s
121121

122122
#### `--trust_remote_code`
123123

124-
* **Description:** Allows TensorRT-LLM to download models and tokenizers from Hugging Face. This flag is passed directly to the Hugging Face API.
124+
* **Description:** Allows TensorRT LLM to download models and tokenizers from Hugging Face. This flag is passed directly to the Hugging Face API.
125125

126126

127127
#### Extra LLM API Options (YAML Configuration)
@@ -159,7 +159,7 @@ See the [`TorchLlmArgs` class](https://nvidia.github.io/TensorRT-LLM/llm-api/ref
159159

160160
### Basic Test
161161

162-
Start a new terminal on the host to test the TensorRT-LLM server you just launched.
162+
Start a new terminal on the host to test the TensorRT LLM server you just launched.
163163

164164
You can query the health/readiness of the server using:
165165

@@ -205,7 +205,7 @@ Here is an example response:
205205

206206
## Benchmarking Performance
207207

208-
To benchmark the performance of your TensorRT-LLM server you can leverage the built-in `benchmark_serving.py` script. To do this first creating a wrapper `bench.sh` script.
208+
To benchmark the performance of your TensorRT LLM server you can leverage the built-in `benchmark_serving.py` script. To do this first creating a wrapper `bench.sh` script.
209209

210210
```shell
211211
cat <<'EOF' > bench.sh

0 commit comments

Comments
 (0)