Skip to content
Merged
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Rerun with `curl -N` and change `stream` in the request to `true` to get the res
Dynamo provides comprehensive benchmarking tools to evaluate and optimize your deployments:

- **[Benchmarking Guide](docs/benchmarks/benchmarking.md)** – Compare deployment topologies (aggregated vs. disaggregated vs. vanilla vLLM) using AIPerf
- **[Pre-Deployment Profiling](docs/benchmarks/pre_deployment_profiling.md)** – Optimize configurations before deployment to meet SLA requirements
- **[SLA-Driven Dynamo Deployments](docs/planner/sla_planner_quickstart.md)** – Optimize your deployment to meet SLA requirements

# Engines

Expand Down
2 changes: 1 addition & 1 deletion components/backends/trtllm/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Advanced disaggregated deployment with SLA-based automatic scaling.
- `TRTLLMPrefillWorker`: Specialized prefill-only worker

> [!NOTE]
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/pre_deployment_profiling.md) for detailed instructions.
> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.

## CRD Structure

Expand Down
2 changes: 1 addition & 1 deletion components/backends/vllm/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/

### Pre-Deployment Profiling (SLA Planner Only)

If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/benchmarks/pre_deployment_profiling.md) to run pre-deployment profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner.
If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/benchmarks/sla_driven_profiling.md) to run pre-deployment profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion components/src/dynamo/planner/utils/perf_interpolation.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

MISSING_PROFILING_DATA_ERROR_MESSAGE = (
"SLA-Planner requires pre-deployment profiling results to run.\n"
"Please follow /docs/benchmarks/pre_deployment_profiling.md to run the profiling first,\n"
"Please follow /docs/benchmarks/sla_driven_profiling.md to run the profiling first,\n"
"and make sure the profiling results are present in --profile-results-dir."
)

Expand Down
2 changes: 1 addition & 1 deletion deploy/utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ python3 -m deploy.utils.download_pvc_results \

For complete benchmarking and profiling workflows:
- **Benchmarking Guide**: See [docs/benchmarks/benchmarking.md](../../docs/benchmarks/benchmarking.md) for comparing DynamoGraphDeployments and external endpoints
- **Pre-Deployment Profiling**: See [docs/benchmarks/pre_deployment_profiling.md](../../docs/benchmarks/pre_deployment_profiling.md) for optimizing configurations before deployment
- **Pre-Deployment Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deployment

## Notes

Expand Down
307 changes: 0 additions & 307 deletions docs/benchmarks/pre_deployment_profiling.md

This file was deleted.

516 changes: 516 additions & 0 deletions docs/benchmarks/sla_driven_profiling.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice if these were hyperlinks that could take you to specific sections.

Also the doc is a bit long / hard to trac, which I think ^ could help with. Otherwise, LGTM!

Large diffs are not rendered by default.

Binary file added docs/images/pd_interpolation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/planner/load_planner.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ There are two additional rules set by planner to prevent over-compensation:

## SLA-based Scaling Up/Down Prefill/Decode Workers

See [Pre-Deployment Profiling](../benchmarks/pre_deployment_profiling.md) for more details.
See [SLA-Driven Profiling](../benchmarks/sla_driven_profiling.md) for more details.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/planner/planner_intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,5 +78,5 @@ Key features include:

Overview <self>
SLA Planner Quick Start <sla_planner_quickstart>
Pre-Deployment Profiling <../benchmarks/pre_deployment_profiling.md>
SLA-Driven Profiling <../benchmarks/sla_driven_profiling.md>
SLA-based Planner <sla_planner.md>
6 changes: 3 additions & 3 deletions docs/planner/sla_planner.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SLA-based Planner

> [!TIP]
> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).
> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Profiling + Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).

This document covers information regarding the SLA-based planner in `examples/common/utils/planner_core.py`.

Expand Down Expand Up @@ -47,11 +47,11 @@ The SLA planner consists of several key components:
3. **Correction Factors**: Adjust predictions based on observed vs. expected performance
4. **Scaling Logic**: Calculate optimal number of prefill/decode replicas to meet SLA targets

## Pre-Deployment Profiling
## SLA-Driven Pre-Deployment Profiling

**Prerequisite**: SLA-based planner requires pre-deployment profiling to be completed before deployment. The profiling process analyzes your model's performance characteristics to determine optimal tensor parallelism configurations and scaling parameters that the planner will use during operation.

See [Pre-Deployment Profiling](../benchmarks/pre_deployment_profiling.md) for detailed instructions on running the profiling process.
See [Pre-Deployment Profiling](../benchmarks/sla_driven_profiling.md) for detailed instructions on running the profiling process.

## Load Prediction

Expand Down
Loading
Loading