Skip to content

Commit abafeb6

Browse files
authored
docs: Update README to reflect torch 2 support (#160)
Update the README file to remove the "experimental" tag from the documentaion. The existance of the tag was an oversight as Torch 2.x has been supported for 18+ months at this point. Signed-off-by: J Wyman <[email protected]>
1 parent db70751 commit abafeb6

File tree

1 file changed

+80
-124
lines changed

1 file changed

+80
-124
lines changed

README.md

Lines changed: 80 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2020-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -81,8 +81,8 @@ Currently, Triton requires that a specially patched version of
8181
PyTorch be used with the PyTorch backend. The full source for
8282
these PyTorch versions are available as Docker images from
8383
[NGC](https://ngc.nvidia.com). For example, the PyTorch version
84-
compatible with the 22.12 release of Triton is available as
85-
nvcr.io/nvidia/pytorch:22.12-py3.
84+
compatible with the 25.09 release of Triton is available as
85+
nvcr.io/nvidia/pytorch:25.09-py3.
8686

8787
Copy over the LibTorch and Torchvision headers and libraries from the
8888
[PyTorch NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
@@ -246,6 +246,79 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
246246

247247
`ENABLE_JIT_PROFILING`
248248

249+
### PyTorch 2.0 Models
250+
251+
The model repository should look like:
252+
253+
```bash
254+
model_repository/
255+
`-- model_directory
256+
|-- 1
257+
| |-- model.py
258+
| `-- [model.pt]
259+
`-- config.pbtxt
260+
```
261+
262+
The `model.py` contains the class definition of the PyTorch model.
263+
The class should extend the
264+
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
265+
The `model.pt` may be optionally provided which contains the saved
266+
[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
267+
of the model.
268+
269+
### TorchScript Models
270+
271+
The model repository should look like:
272+
273+
```bash
274+
model_repository/
275+
`-- model_directory
276+
|-- 1
277+
| `-- model.pt
278+
`-- config.pbtxt
279+
```
280+
281+
The `model.pt` is the TorchScript model file.
282+
283+
### Customization
284+
285+
The following PyTorch settings may be customized by setting parameters on the
286+
`config.pbtxt`.
287+
288+
[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
289+
290+
* Key: `NUM_THREADS`
291+
* Value: The number of threads used for intra-op parallelism on CPU.
292+
293+
[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
294+
295+
* Key: `NUM_INTEROP_THREADS`
296+
* Value: The number of threads used for interop parallelism (e.g. in JIT interpreter) on CPU.
297+
298+
[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
299+
300+
* Key: `TORCH_COMPILE_OPTIONAL_PARAMETERS`
301+
* Value: Any of following parameter(s) encoded as a JSON object.
302+
* `fullgraph` (`bool`): Whether it is ok to break model into several subgraphs.
303+
* `dynamic` (`bool`): Use dynamic shape tracing.
304+
* `backend` (`str`): The backend to be used.
305+
* `mode` (`str`): Can be either `"default"`, `"reduce-overhead"`, or `"max-autotune"`.
306+
* `options` (`dict`): A dictionary of options to pass to the backend.
307+
* `disable` (`bool`): Turn `torch.compile()` into a no-op for testing.
308+
309+
For example:
310+
311+
```proto
312+
parameters: {
313+
key: "NUM_THREADS"
314+
value: { string_value: "4" }
315+
}
316+
parameters: {
317+
key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
318+
value: { string_value: "{\"disable\": true}" }
319+
}
320+
```
321+
249322
### Support
250323

251324
#### Model Instance Group Kind
@@ -306,126 +379,9 @@ instance in the
306379
to ensure that the model instance and the tensors used for inference are
307380
assigned to the same GPU device as on which the model was traced.
308381

309-
# PyTorch 2.0 Backend \[Experimental\]
310-
311-
> [!WARNING]
312-
> *This feature is subject to change and removal.*
313-
314-
Starting from 24.01, PyTorch models can be served directly via
315-
[Python runtime](src/model.py). By default, Triton will use the
316-
[LibTorch runtime](#pytorch-libtorch-backend) for PyTorch models. To use Python
317-
runtime, provide the following
318-
[runtime setting](https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library)
319-
in the model configuration:
320-
321-
```
322-
runtime: "model.py"
323-
```
324-
325-
## Dependencies
382+
* Python functions optimizable by `torch.compile` may not be served directly in the `model.py` file, they need to be enclosed by a class extending the
383+
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
326384

327-
### Python backend dependency
385+
* Model weights cannot be shared across multiple instances on the same GPU device.
328386

329-
This feature depends on
330-
[Python backend](https://github.com/triton-inference-server/python_backend),
331-
see
332-
[Python-based Backends](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md)
333-
for more details.
334-
335-
### PyTorch dependency
336-
337-
This feature will take advantage of the
338-
[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
339-
optimization, make sure the
340-
[PyTorch 2.0+ pip package](https://pypi.org/project/torch) is available in the
341-
same Python environment.
342-
343-
Alternatively, a [Python Execution Environment](#using-custom-python-execution-environments)
344-
with the PyTorch dependency may be used. It can be created with the
345-
[provided script](tools/gen_pb_exec_env.sh). The resulting
346-
`pb_exec_env_model.py.tar.gz` file should be placed at the same
347-
[backend shared library](https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library)
348-
directory as the [Python runtime](src/model.py).
349-
350-
## Model Layout
351-
352-
### PyTorch 2.0 models
353-
354-
The model repository should look like:
355-
356-
```
357-
model_repository/
358-
`-- model_directory
359-
|-- 1
360-
| |-- model.py
361-
| `-- [model.pt]
362-
`-- config.pbtxt
363-
```
364-
365-
The `model.py` contains the class definition of the PyTorch model. The class
366-
should extend the
367-
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
368-
The `model.pt` may be optionally provided which contains the saved
369-
[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
370-
of the model.
371-
372-
### TorchScript models
373-
374-
The model repository should look like:
375-
376-
```
377-
model_repository/
378-
`-- model_directory
379-
|-- 1
380-
| `-- model.pt
381-
`-- config.pbtxt
382-
```
383-
384-
The `model.pt` is the TorchScript model file.
385-
386-
## Customization
387-
388-
The following PyTorch settings may be customized by setting parameters on the
389-
`config.pbtxt`.
390-
391-
[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
392-
- Key: NUM_THREADS
393-
- Value: The number of threads used for intraop parallelism on CPU.
394-
395-
[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
396-
- Key: NUM_INTEROP_THREADS
397-
- Value: The number of threads used for interop parallelism (e.g. in JIT
398-
interpreter) on CPU.
399-
400-
[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
401-
- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
402-
- Value: Any of following parameter(s) encoded as a JSON object.
403-
- fullgraph (*bool*): Whether it is ok to break model into several subgraphs.
404-
- dynamic (*bool*): Use dynamic shape tracing.
405-
- backend (*str*): The backend to be used.
406-
- mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune".
407-
- options (*dict*): A dictionary of options to pass to the backend.
408-
- disable (*bool*): Turn `torch.compile()` into a no-op for testing.
409-
410-
For example:
411-
```
412-
parameters: {
413-
key: "NUM_THREADS"
414-
value: { string_value: "4" }
415-
}
416-
parameters: {
417-
key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
418-
value: { string_value: "{\"disable\": true}" }
419-
}
420-
```
421-
422-
## Limitations
423-
424-
Following are few known limitations of this feature:
425-
- Python functions optimizable by `torch.compile` may not be served directly in
426-
the `model.py` file, they need to be enclosed by a class extending the
427-
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
428-
- Model weights cannot be shared across multiple instances on the same GPU
429-
device.
430-
- When using `KIND_MODEL` as model instance kind, the default device of the
431-
first parameter on the model is used.
387+
* When using `KIND_MODEL` as model instance kind, the default device of the first parameter on the model is used.

0 commit comments

Comments
 (0)