Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
95a7fbe
refactor: move engine configs out of components directory
nv-anants Oct 21, 2025
1c51164
fix multi node files
nv-anants Oct 21, 2025
d2fcf83
copy paste fix
nv-anants Oct 21, 2025
126ccd4
fix
nv-anants Oct 21, 2025
b14ec27
move remaining recipes
tanmayv25 Oct 22, 2025
10a8aac
Merge branch 'main' into reorg/trtllm-configs
nv-anants Oct 23, 2025
c499173
rabbit
nv-anants Oct 23, 2025
d9d40d2
more rabbit
nv-anants Oct 23, 2025
232e642
Merge branch 'main' into reorg/trtllm-configs
nv-anants Oct 23, 2025
025e8db
refactor: move backend deploy, launch and slurm files from components…
nv-anants Oct 23, 2025
0c4b7b8
update all path refs
nv-anants Oct 23, 2025
b1da6c4
fixes
nv-anants Oct 23, 2025
260ec35
Merge branch 'main' into reorg/trtllm-configs
nv-anants Oct 23, 2025
2556045
Merge branch 'reorg/trtllm-configs' into anants/move-backends
nv-anants Oct 23, 2025
bc55e48
Merge branch 'main' into reorg/trtllm-configs
nv-anants Oct 24, 2025
d19a92c
Merge branch 'reorg/trtllm-configs' into anants/move-backends
nv-anants Oct 24, 2025
e2655ed
updates
nv-anants Oct 24, 2025
3b3fc73
rabbit
nv-anants Oct 24, 2025
9120955
Update task_definition_prefillworker.json
nv-anants Oct 24, 2025
42bd2df
add recipes to docker
nv-anants Oct 24, 2025
d55dc3d
Merge branch 'reorg/trtllm-configs' into anants/move-backends
nv-anants Oct 24, 2025
be35a04
Merge branch 'main' into anants/move-backends
nv-anants Oct 24, 2025
bac0b25
Merge branch 'main' into anants/move-backends
nv-anants Oct 27, 2025
e29d851
Merge branch 'main' into anants/move-backends
nv-anants Oct 27, 2025
21d5ef2
Merge branch 'main' into anants/move-backends
nv-anants Oct 27, 2025
c510ad5
Merge branch 'main' into anants/move-backends
nv-anants Oct 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/filters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,21 +28,21 @@ vllm: &vllm
- 'container/Dockerfile.vllm'
- 'container/deps/requirements.vllm.txt'
- 'container/deps/vllm/**'
- 'components/backends/vllm/**'
- 'examples/backends/vllm/**'
- 'components/src/dynamo/vllm/**'
- 'tests/serve/test_vllm.py'

sglang: &sglang
- 'container/Dockerfile.sglang'
- 'container/Dockerfile.sglang-wideep'
- 'components/backends/sglang/**'
- 'examples/backends/sglang/**'
- 'components/src/dynamo/sglang/**'
- 'container/build.sh'
- 'tests/serve/test_sglang.py'

trtllm: &trtllm
- 'container/Dockerfile.trtllm'
- 'components/backends/trtllm/**'
- 'examples/backends/trtllm/**'
- 'components/src/dynamo/trtllm/**'
- 'container/build.sh'
- 'container/build_trtllm_wheel.sh'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/container-validation-backends.yml
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ jobs:
# export KUBECONFIG=$(pwd)/.kubeconfig
# kubectl config set-context --current --namespace=$NAMESPACE

# cd components/backends/$FRAMEWORK
# cd examples/backends/$FRAMEWORK
# export FRAMEWORK_RUNTIME_IMAGE="${{ secrets.AZURE_ACR_HOSTNAME }}/ai-dynamo/dynamo:${{ github.sha }}-${FRAMEWORK}-amd64"
# export KUBE_NS=$NAMESPACE
# export GRAPH_NAME=$(yq e '.metadata.name' $DEPLOYMENT_FILE)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Rerun with `curl -N` and change `stream` in the request to `true` to get the res
### Deploying Dynamo

- Follow the [Quickstart Guide](docs/kubernetes/README.md) to deploy on Kubernetes.
- Check out [Backends](components/backends) to deploy various workflow configurations (e.g. SGLang with router, vLLM with disaggregated serving, etc.)
- Check out [Backends](examples/backends) to deploy various workflow configurations (e.g. SGLang with router, vLLM with disaggregated serving, etc.)
- Run some [Examples](examples) to learn about building components in Dynamo and exploring various integrations.

### Benchmarking Dynamo
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This directory contains benchmarking scripts and tools for performance evaluatio
## Quick Start

### Benchmark a Dynamo Deployment
First, deploy your DynamoGraphDeployment using the [deployment documentation](../components/backends/), then:
First, deploy your DynamoGraphDeployment using the [deployment documentation](../docs/kubernetes/), then:

```bash
# Port-forward your deployment to http://localhost:8000
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/profiler/utils/config_modifiers/sglang.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
logger.addHandler(console_handler)


DEFAULT_SGLANG_CONFIG_PATH = "components/backends/sglang/deploy/disagg.yaml"
DEFAULT_SGLANG_CONFIG_PATH = "examples/backends/sglang/deploy/disagg.yaml"


class SGLangConfigModifier:
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/profiler/utils/config_modifiers/trtllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
logger.addHandler(console_handler)


DEFAULT_TRTLLM_CONFIG_PATH = "components/backends/trtllm/deploy/disagg.yaml"
DEFAULT_TRTLLM_CONFIG_PATH = "examples/backends/trtllm/deploy/disagg.yaml"


class TrtllmConfigModifier:
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/profiler/utils/config_modifiers/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
logger.addHandler(console_handler)


DEFAULT_VLLM_CONFIG_PATH = "components/backends/vllm/deploy/disagg.yaml"
DEFAULT_VLLM_CONFIG_PATH = "examples/backends/vllm/deploy/disagg.yaml"


class VllmV1ConfigModifier:
Expand Down
22 changes: 7 additions & 15 deletions components/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,17 @@ limitations under the License.

This directory contains the core components that make up the Dynamo inference framework. Each component serves a specific role in the distributed LLM serving architecture, enabling high-throughput, low-latency inference across multiple nodes and GPUs.

## Supported Inference Engines

Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:

- **[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
- **[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration

Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.

## Core Components

### [Backends](backends/)
### Backends

Dynamo supports multiple inference engines, each with their own deployment configurations and capabilities:

The backends directory contains inference engine integrations and implementations, with a key focus on:
- **[vLLM](/docs/backends/vllm/README.md)** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, SLA-based planning, native KV cache events, and NIXL-based transfer mechanisms
- **[SGLang](/docs/backends/sglang/README.md)** - SGLang engine integration with ZMQ-based communication, supporting disaggregated serving and KV-aware routing
- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - TensorRT-LLM integration with disaggregated serving capabilities and TensorRT acceleration

- **vLLM** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, and SLA-based planning
- **SGLang** - SGLang engine integration supporting disaggregated serving and KV-aware routing
- **TensorRT-LLM** - TensorRT-LLM integration with disaggregated serving capabilities
Each engine provides launch and deploy scripts for different deployment patterns in the [examples](../examples/backends/) folder.


### [Frontend](src/dynamo/frontend/)
Expand Down
2 changes: 1 addition & 1 deletion components/src/dynamo/router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Clients query the `find_best_worker` endpoint to determine which worker should p
>
> Use this manual setup if you need explicit control over prefill routing configuration or want to manage prefill and decode routers separately.
See [`components/backends/vllm/launch/disagg_router.sh`](/components/backends/vllm/launch/disagg_router.sh) for a complete example.
See [`examples/backends/vllm/launch/disagg_router.sh`](/examples/backends/vllm/launch/disagg_router.sh) for a complete example.

```bash
# Start frontend router for decode workers
Expand Down
2 changes: 1 addition & 1 deletion container/Dockerfile.sglang-wideep
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,4 @@ ENV PATH=/usr/local/bin/etcd:$PATH
# Enable forceful shutdown of inflight requests
ENV SGL_FORCE_SHUTDOWN=1

WORKDIR /sgl-workspace/dynamo/components/backends/sglang
WORKDIR /sgl-workspace/dynamo/examples/backends/sglang
6 changes: 3 additions & 3 deletions deploy/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ This approach allows you to install Dynamo directly using a DynamoGraphDeploymen
Here is how you would install a VLLM inference backend example.

```bash
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./components/backends/vllm/deploy/agg.yaml
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/backends/vllm/deploy/agg.yaml
```

### Installation using Grove

Same example as above, but using Grove PodCliqueSet resources.

```bash
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./components/backends/vllm/deploy/agg.yaml --set deploymentType=grove
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/backends/vllm/deploy/agg.yaml --set deploymentType=grove
```

### Customizable Properties
Expand All @@ -50,7 +50,7 @@ You can override the default configuration by setting the following properties:

```bash
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud \
-f ./components/backends/vllm/deploy/agg.yaml \
-f ./examples/backends/vllm/deploy/agg.yaml \
--set "imagePullSecrets[0].name=docker-secret-1" \
--set etcdAddr="my-etcd-service:2379" \
--set natsAddr="nats://my-nats-service:4222"
Expand Down
6 changes: 3 additions & 3 deletions deploy/inference-gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,12 @@ kubectl get gateway inference-gateway -n my-model

### 3. Deploy Your Model ###

Follow the steps in [model deployment](../../components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
Follow the steps in [model deployment](../../examples/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../examples/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.

Sample commands to deploy model:

```bash
cd <dynamo-source-root>/components/backends/vllm/deploy
cd <dynamo-source-root>/examples/backends/vllm/deploy
kubectl apply -f agg.yaml -n my-model
```

Expand All @@ -116,7 +116,7 @@ kubectl create secret generic hf-token-secret \
```

Create a model configuration file similar to the vllm_agg_qwen.yaml for your model.
This file demonstrates the values needed for the Vllm Agg setup in [agg.yaml](../../components/backends/vllm/deploy/agg.yaml)
This file demonstrates the values needed for the Vllm Agg setup in [agg.yaml](../../examples/backends/vllm/deploy/agg.yaml)
Take a note of the model's block size provided in the model card.

### 4. Install Dynamo GAIE helm chart ###
Expand Down
6 changes: 3 additions & 3 deletions deploy/tracing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Run the vLLM disaggregated script with tracing enabled:

```bash
# Navigate to vLLM launch directory
cd components/backends/vllm/launch
cd examples/backends/vllm/launch

# Run disaggregated deployment (modify the script to export env vars first)
./disagg.sh
Expand Down Expand Up @@ -179,7 +179,7 @@ For Kubernetes deployments, ensure you have a Tempo instance deployed and access

### Modify DynamoGraphDeployment for Tracing

Add common tracing environment variables at the top level and service-specific names in each component in your `DynamoGraphDeployment` (e.g., `components/backends/vllm/deploy/disagg.yaml`):
Add common tracing environment variables at the top level and service-specific names in each component in your `DynamoGraphDeployment` (e.g., `examples/backends/vllm/deploy/disagg.yaml`):

```yaml
apiVersion: nvidia.com/v1alpha1
Expand Down Expand Up @@ -228,7 +228,7 @@ spec:
Apply the updated DynamoGraphDeployment:

```bash
kubectl apply -f components/backends/vllm/deploy/disagg.yaml
kubectl apply -f examples/backends/vllm/deploy/disagg.yaml
```

Traces will now be exported to Tempo and can be viewed in Grafana.
Expand Down
16 changes: 8 additions & 8 deletions docs/backends/sglang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,14 +182,14 @@ docker compose -f deploy/docker-compose.yml up -d
### Aggregated Serving

```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg.sh
```

### Aggregated Serving with KV Routing

```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg_router.sh
```

Expand All @@ -198,7 +198,7 @@ cd $DYNAMO_HOME/components/backends/sglang
Here's an example that uses the [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) model.

```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg_embed.sh
```

Expand All @@ -222,14 +222,14 @@ See [SGLang Disaggregation](sglang-disaggregation.md) to learn more about how sg


```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/disagg.sh
```

### Disaggregated Serving with KV Aware Prefill Routing

```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/disagg_router.sh
```

Expand All @@ -239,7 +239,7 @@ You can use this configuration to test out disaggregated serving with dp attenti

```bash
# note this will require 4 GPUs
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/disagg_dp_attn.sh
```

Expand Down Expand Up @@ -285,7 +285,7 @@ Below we provide a selected list of advanced examples. Please open up an issue i
We currently provide deployment examples for Kubernetes and SLURM.

## Kubernetes
- **[Deploying Dynamo with SGLang on Kubernetes](../../../components/backends/sglang/deploy/README.md)**
- **[Deploying Dynamo with SGLang on Kubernetes](../../../examples/backends/sglang/deploy/README.md)**

## SLURM
- **[Deploying Dynamo with SGLang on SLURM](../../../components/backends/sglang/slurm_jobs/README.md)**
- **[Deploying Dynamo with SGLang on SLURM](../../../examples/backends/sglang/slurm_jobs/README.md)**
2 changes: 1 addition & 1 deletion docs/backends/sglang/dsr1-wideep-h100.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ docker run \
dynamo-wideep:latest
```

In each container, you should be in the `/sgl-workspace/dynamo/components/backends/sglang` directory.
In each container, you should be in the `/sgl-workspace/dynamo/examples/backends/sglang` directory.

3. Run the ingress and prefill worker

Expand Down
4 changes: 2 additions & 2 deletions docs/backends/sglang/multimodal_epd.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ flowchart LR
```

```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/multimodal_agg.sh
```

Expand Down Expand Up @@ -133,7 +133,7 @@ flowchart LR


```bash
cd $DYNAMO_HOME/components/backends/sglang
cd $DYNAMO_HOME/examples/backends/sglang
./launch/multimodal_disagg.sh
```

Expand Down
16 changes: 8 additions & 8 deletions docs/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,13 +128,13 @@ This figure shows an overview of the major components to deploy:

### Aggregated
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
./launch/agg.sh
```

### Aggregated with KV Routing
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
./launch/agg_router.sh
```

Expand All @@ -144,7 +144,7 @@ cd $DYNAMO_HOME/components/backends/trtllm
> Disaggregated serving supports two strategies for request flow: `"prefill_first"` and `"decode_first"`. By default, the script below uses the `"decode_first"` strategy, which can reduce response latency by minimizing extra hops in the return path. You can switch strategies by setting the `DISAGGREGATION_STRATEGY` environment variable.

```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
./launch/disagg.sh
```

Expand All @@ -154,13 +154,13 @@ cd $DYNAMO_HOME/components/backends/trtllm
> Disaggregated serving with KV routing uses a "prefill first" workflow by default. Currently, Dynamo supports KV routing to only one endpoint per model. In disaggregated workflow, it is generally more effective to route requests to the prefill worker. If you wish to use a "decode first" workflow instead, you can simply set the `DISAGGREGATION_STRATEGY` environment variable accordingly.

```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
./launch/disagg_router.sh
```

### Aggregated with Multi-Token Prediction (MTP) and DeepSeek R1
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm

export AGG_ENGINE_ARGS=./recipes/deepseek-r1/trtllm/mtp/mtp_agg.yaml
export SERVED_MODEL_NAME="nvidia/DeepSeek-R1-FP4"
Expand All @@ -186,7 +186,7 @@ For comprehensive instructions on multinode serving, see the [multinode-examples

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../components/backends/trtllm/deploy/README.md).
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../examples/backends/trtllm/deploy/README.md).

### Client

Expand Down Expand Up @@ -270,7 +270,7 @@ Logits processors let you modify the next-token logits at every decoding step (e
You can enable a test-only processor that forces the model to respond with "Hello world!". This is useful to verify the wiring without modifying your model or engine code.

```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
export DYNAMO_ENABLE_TEST_LOGITS_PROCESSOR=1
./launch/agg.sh
```
Expand Down Expand Up @@ -316,7 +316,7 @@ sampling_params.logits_processor = create_trtllm_adapters(processors)

## Performance Sweep

For detailed instructions on running comprehensive performance sweeps across both aggregated and disaggregated serving configurations, see the [TensorRT-LLM Benchmark Scripts for DeepSeek R1 model](../../../components/backends/trtllm/performance_sweeps/README.md). This guide covers recommended benchmarking setups, usage of provided scripts, and best practices for evaluating system performance.
For detailed instructions on running comprehensive performance sweeps across both aggregated and disaggregated serving configurations, see the [TensorRT-LLM Benchmark Scripts for DeepSeek R1 model](../../../examples/backends/trtllm/performance_sweeps/README.md). This guide covers recommended benchmarking setups, usage of provided scripts, and best practices for evaluating system performance.

## Dynamo KV Block Manager Integration

Expand Down
8 changes: 4 additions & 4 deletions docs/backends/trtllm/gemma3_sliding_window_attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ VSWA is a mechanism in which a model’s layers alternate between multiple slidi
## Aggregated Serving
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
Expand All @@ -36,7 +36,7 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml

## Aggregated Serving with KV Routing
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
Expand All @@ -45,7 +45,7 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml

## Disaggregated Serving
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml
Expand All @@ -55,7 +55,7 @@ export DECODE_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_decode.yaml

## Disaggregated Serving with KV Routing
```bash
cd $DYNAMO_HOME/components/backends/trtllm
cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml
Expand Down
Loading
Loading