|
| 1 | +# Dynamo AIPerf Analyze and Visualize For Profiling Sweeps |
| 2 | + |
| 3 | +**Status**: Draft |
| 4 | + |
| 5 | +**Authors**: [ilana-n] |
| 6 | + |
| 7 | +**Category**: Feature |
| 8 | + |
| 9 | +**Replaces**: N/A |
| 10 | + |
| 11 | +**Replaced By**: N/A |
| 12 | + |
| 13 | +**Sponsor**: [TBD] |
| 14 | + |
| 15 | +**Required Reviewers**: [TBD] |
| 16 | + |
| 17 | +**Review Date**: [TBD] |
| 18 | + |
| 19 | +**Pull Request**: [TBD] |
| 20 | + |
| 21 | +**Implementation PR / Tracking Issue**: [TBD] |
| 22 | + |
| 23 | +# Summary |
| 24 | + |
| 25 | +Introduce two new commands to AIPerf: `aiperf analyze` runs parameter sweeps and stores results, and `aiperf plot` generates visualizations from stored results. This addresses the common workflow of profiling across multiple configurations (e.g., different concurrency levels) to compare performance and generate pareto curves. |
| 26 | + |
| 27 | +# Motivation |
| 28 | + |
| 29 | +Users need to profile models across multiple parameter configurations to understand performance trade-offs. |
| 30 | + |
| 31 | +The typical workflow is: |
| 32 | +1. Run profiling sweeps using a custom script with different parameters (concurrency, sequence length, etc.) |
| 33 | +2. Visualize results to identify optimal configurations |
| 34 | +3. Generate pareto curves showing throughput vs latency trade-offs |
| 35 | + |
| 36 | +AIPerf can improve the user experience by having built-in methods to orchestrate these sweeps and visualize the results. |
| 37 | + |
| 38 | +## Goals |
| 39 | + |
| 40 | +* Run parameter sweeps with a single command that orchestrates multiple profiling runs |
| 41 | +* Store sweep results in organized directories |
| 42 | +* Provide interactive visualizations where users can adjust settings in the browser |
| 43 | +* Allow users to configure which plots are generated by default |
| 44 | +* Support static artifact generation for reports and CI/CD |
| 45 | + |
| 46 | +## Non Goals |
| 47 | + |
| 48 | +* Real-time visualization during profiling |
| 49 | +* Integration with external databases |
| 50 | + |
| 51 | +# Proposal |
| 52 | + |
| 53 | +## Command Design |
| 54 | + |
| 55 | +### Analyze Command |
| 56 | + |
| 57 | +Runs a parameter sweep and stores results: |
| 58 | +```bash |
| 59 | +# Basic sweep |
| 60 | +aiperf analyze \ |
| 61 | + --model Qwen3-0.6B \ |
| 62 | + --concurrency 1,4,8,16 \ |
| 63 | + --output-dir ./my_sweep |
| 64 | + |
| 65 | +# Multiple parameters |
| 66 | +aiperf analyze \ |
| 67 | + --model Qwen3-0.6B \ |
| 68 | + --concurrency 1,4,8 \ |
| 69 | + --output-seq-len 128,256,512 \ |
| 70 | + --output-dir ./my_sweep |
| 71 | + |
| 72 | +# With config file |
| 73 | +aiperf analyze --config sweep.yaml --output-dir ./my_sweep |
| 74 | +``` |
| 75 | + |
| 76 | +**Behavior:** |
| 77 | +- Runs each profiling configuration sequentially (if we have multiple GPUs we could run in parallel?) |
| 78 | +- Stores each run in a unique subdirectory |
| 79 | +- Shows progress: `[2/4] Running: concurrency=4... (48s)` |
| 80 | +- Outputs message at completion: `Run 'aiperf plot ./my_sweep --host' to visualize` |
| 81 | + |
| 82 | +### Plot Command |
| 83 | + |
| 84 | +Generates visualizations from stored results. Supports both sweep-level comparison and single-run analysis: |
| 85 | +```bash |
| 86 | +# Sweep-level: Compare multiple runs interactively |
| 87 | +aiperf plot ./my_sweep --host |
| 88 | + |
| 89 | +# Single-run: Analyze one profiling run |
| 90 | +aiperf plot ./my_sweep/Qwen3-0.6B-concurrency8 --host |
| 91 | + |
| 92 | +# Or use --run flag to specify subdirectory |
| 93 | +aiperf plot ./my_sweep --run Qwen3-0.6B-concurrency8 --host |
| 94 | + |
| 95 | +# Generate static images for sweep |
| 96 | +aiperf plot ./my_sweep --format png |
| 97 | + |
| 98 | +# Generate static images for single run |
| 99 | +aiperf plot ./my_sweep/Qwen3-0.6B-concurrency8 --format png |
| 100 | + |
| 101 | +# Generate HTML report |
| 102 | +aiperf plot ./my_sweep --format html |
| 103 | +``` |
| 104 | + |
| 105 | +**Sweep-Level Mode (multiple runs):** |
| 106 | + |
| 107 | +When pointing to a sweep directory containing multiple runs: |
| 108 | +- Launches web server with sweep comparison dashboard |
| 109 | +- Users can adjust settings in browser: |
| 110 | + - Select which runs to compare |
| 111 | + - Change x-axis/y-axis for comparison plots |
| 112 | + - Toggle between plot types |
| 113 | + - Apply filters |
| 114 | +- Default plots focus on comparison: |
| 115 | + - Pareto curve (throughput vs latency across runs) |
| 116 | + - Throughput vs swept parameter |
| 117 | + - Latency distribution comparison |
| 118 | + - Resource utilization comparison |
| 119 | + |
| 120 | +**Single-Run Mode (one run):** |
| 121 | + |
| 122 | +When pointing to a specific run subdirectory: |
| 123 | +- Launches web server with single-run deep-dive dashboard |
| 124 | +- Users can explore per-request metrics over time: |
| 125 | + - Time series: TTFT (Time to First Token) per request |
| 126 | + - Time series: ITL (Inter-token Latency) per request |
| 127 | + - Time series: E2E latency per request |
| 128 | + - Request throughput over time |
| 129 | + - Resource utilization over time (GPU memory, utilization) |
| 130 | +- Interactive controls: |
| 131 | + - Zoom into time ranges |
| 132 | + - Filter by request characteristics (input length, output length) |
| 133 | + - Toggle between metrics |
| 134 | + - Overlay multiple metrics |
| 135 | + |
| 136 | +**Static Mode (`--format`):** |
| 137 | +- Generates plot files in `{sweep_dir}/visualizations/` or `{run_dir}/visualizations/` |
| 138 | +- PNG: Individual plot images |
| 139 | +- HTML: Self-contained interactive report |
| 140 | + |
| 141 | +## Configuration |
| 142 | + |
| 143 | +Users can configure default plots for both interactive mode and static mode in `~/.aiperf/config.yaml`: |
| 144 | +```yaml |
| 145 | +visualization: |
| 146 | + default_plots: |
| 147 | + - pareto_curve |
| 148 | + - throughput_vs_concurrency |
| 149 | + - latency_distribution |
| 150 | + - resource_utilization |
| 151 | + |
| 152 | + custom_plots: |
| 153 | + - name: "Memory Efficiency" |
| 154 | + type: "line" |
| 155 | + x_axis: "concurrency" |
| 156 | + y_axis: "gpu_memory_used" |
| 157 | +``` |
| 158 | +
|
| 159 | +If no config exists, AIPerf provides some pre-determinted defaults: |
| 160 | +- Pareto curve (throughput vs latency) |
| 161 | +- Throughput vs swept parameter |
| 162 | +- Latency distribution |
| 163 | +
|
| 164 | +## Sweep Configuration File |
| 165 | +```yaml |
| 166 | +# sweep.yaml |
| 167 | +model: Qwen3-0.6B |
| 168 | + |
| 169 | +sweep: |
| 170 | + concurrency: [1, 4, 8, 16, 32] |
| 171 | + output_seq_len: [128, 256, 512] |
| 172 | + |
| 173 | +fixed: |
| 174 | + input_seq_len: 512 |
| 175 | +``` |
| 176 | +
|
| 177 | +## Visualization Technology: Plotly Dash |
| 178 | +
|
| 179 | +The plot command is planned to use **Plotly Dash**, a Python framework for building interactive web applications. |
| 180 | +
|
| 181 | +**Key Benefits:** |
| 182 | +- **Pure Python** - No JavaScript required, consistent with AIPerf's codebase |
| 183 | +- **Interactive by design** - Users adjust settings (axes, filters, run selection) in browser without re-running commands |
| 184 | +- **Self-contained** - Runs locally with no external services required |
| 185 | +- **Customizable** - Full control over AIPerf-specific visualizations and styling |
| 186 | +
|
| 187 | +**Proposed Flow:** |
| 188 | +``` |
| 189 | +aiperf plot ./my_sweep --host |
| 190 | + ↓ |
| 191 | +Load sweep data → Create Dash app → Launch server at localhost:8080 |
| 192 | + ↓ |
| 193 | +Browser opens with: |
| 194 | + - Sidebar: controls for run selection, axes, plot types |
| 195 | + - Main area: interactive Plotly charts (zoom, pan, hover) |
| 196 | + - Tabs: switch between visualizations (pareto, throughput, latency) |
| 197 | +``` |
| 198 | +
|
| 199 | +**Example Interactivity:** |
| 200 | +```python |
| 201 | +# When user changes dropdown, plot updates instantly via callback |
| 202 | +@app.callback(Output('plot', 'figure'), Input('x-axis', 'value')) |
| 203 | +def update_plot(x_axis): |
| 204 | + return generate_plot(x_axis=x_axis) |
| 205 | +``` |
| 206 | +
|
| 207 | +This approach should provide the customization needed for AIPerf's use cases. Alternative frameworks (Streamlit, React+Flask) could be considered during implementation if requirements change. |
| 208 | +
|
| 209 | +# Alternate Solutions |
| 210 | +
|
| 211 | +## Alt 1: Single Command (Sweep + Auto-Visualize) |
| 212 | +
|
| 213 | +Automatically generate visualizations after sweep completes. |
| 214 | +
|
| 215 | +**Pros:** |
| 216 | +- One command for everything |
| 217 | +- Immediate results |
| 218 | +
|
| 219 | +**Cons:** |
| 220 | +- Sweeps can take hours; users may want to visualize later |
| 221 | +- Cannot re-visualize with different settings without re-running sweep |
| 222 | +- Not flexible for CI/CD |
| 223 | +
|
| 224 | +**Reason Rejected:** |
| 225 | +Separate commands provide better control and avoid re-running expensive profiling operations when users only want different visualizations or running visualizations when users only want raw profiling exports. |
| 226 | +
|
| 227 | +## Alt 2: External Tools (TensorBoard/WanDB) |
| 228 | +
|
| 229 | +Use existing visualization platforms. |
| 230 | +
|
| 231 | +**Pros:** |
| 232 | +- No maintenance burden for UI/UX integrations. |
| 233 | +- Feature-rich |
| 234 | +
|
| 235 | +**Cons:** |
| 236 | +- WanDB needs login and WiFi access |
| 237 | +- Tensorboard is more specific to ML applications |
| 238 | +- Less control over AIPerf-specific visualizations |
| 239 | +
|
0 commit comments