feat: add aiperf viz DEP

ilana-n · ilana-n · commit 29ae9f687372 · 2025-10-20T13:44:17.000-04:00
diff --git a/deps/0012-aiperf-viz.md b/deps/0012-aiperf-viz.md
@@ -0,0 +1,239 @@
+# Dynamo AIPerf Analyze and Visualize For Profiling Sweeps
+
+**Status**: Draft
+
+**Authors**: [ilana-n]
+
+**Category**: Feature
+
+**Replaces**: N/A
+
+**Replaced By**: N/A
+
+**Sponsor**: [TBD]
+
+**Required Reviewers**: [TBD]
+
+**Review Date**: [TBD]
+
+**Pull Request**: [TBD]
+
+**Implementation PR / Tracking Issue**: [TBD]
+
+# Summary
+
+Introduce two new commands to AIPerf: `aiperf analyze` runs parameter sweeps and stores results, and `aiperf plot` generates visualizations from stored results. This addresses the common workflow of profiling across multiple configurations (e.g., different concurrency levels) to compare performance and generate pareto curves.
+
+# Motivation
+
+Users need to profile models across multiple parameter configurations to understand performance trade-offs. 
+
+The typical workflow is:
+1. Run profiling sweeps using a custom script with different parameters (concurrency, sequence length, etc.)
+2. Visualize results to identify optimal configurations
+3. Generate pareto curves showing throughput vs latency trade-offs
+
+AIPerf can improve the user experience by having built-in methods to orchestrate these sweeps and visualize the results.
+
+## Goals
+
+* Run parameter sweeps with a single command that orchestrates multiple profiling runs
+* Store sweep results in organized directories
+* Provide interactive visualizations where users can adjust settings in the browser
+* Allow users to configure which plots are generated by default
+* Support static artifact generation for reports and CI/CD
+
+## Non Goals
+
+* Real-time visualization during profiling
+* Integration with external databases
+
+# Proposal
+
+## Command Design
+
+### Analyze Command
+
+Runs a parameter sweep and stores results:
+```bash
+# Basic sweep
+aiperf analyze \
+  --model Qwen3-0.6B \
+  --concurrency 1,4,8,16 \
+  --output-dir ./my_sweep
+
+# Multiple parameters
+aiperf analyze \
+  --model Qwen3-0.6B \
+  --concurrency 1,4,8 \
+  --output-seq-len 128,256,512 \
+  --output-dir ./my_sweep
+
+# With config file
+aiperf analyze --config sweep.yaml --output-dir ./my_sweep
+```
+
+**Behavior:**
+- Runs each profiling configuration sequentially (if we have multiple GPUs we could run in parallel?)
+- Stores each run in a unique subdirectory
+- Shows progress: `[2/4] Running: concurrency=4... (48s)`
+- Outputs message at completion: `Run 'aiperf plot ./my_sweep --host' to visualize`
+
+### Plot Command
+
+Generates visualizations from stored results. Supports both sweep-level comparison and single-run analysis:
+```bash
+# Sweep-level: Compare multiple runs interactively
+aiperf plot ./my_sweep --host
+
+# Single-run: Analyze one profiling run
+aiperf plot ./my_sweep/Qwen3-0.6B-concurrency8 --host
+
+# Or use --run flag to specify subdirectory
+aiperf plot ./my_sweep --run Qwen3-0.6B-concurrency8 --host
+
+# Generate static images for sweep
+aiperf plot ./my_sweep --format png
+
+# Generate static images for single run
+aiperf plot ./my_sweep/Qwen3-0.6B-concurrency8 --format png
+
+# Generate HTML report
+aiperf plot ./my_sweep --format html
+```
+
+**Sweep-Level Mode (multiple runs):**
+
+When pointing to a sweep directory containing multiple runs:
+- Launches web server with sweep comparison dashboard
+- Users can adjust settings in browser:
+  - Select which runs to compare
+  - Change x-axis/y-axis for comparison plots
+  - Toggle between plot types
+  - Apply filters
+- Default plots focus on comparison:
+  - Pareto curve (throughput vs latency across runs)
+  - Throughput vs swept parameter
+  - Latency distribution comparison
+  - Resource utilization comparison
+
+**Single-Run Mode (one run):**
+
+When pointing to a specific run subdirectory:
+- Launches web server with single-run deep-dive dashboard
+- Users can explore per-request metrics over time:
+  - Time series: TTFT (Time to First Token) per request
+  - Time series: ITL (Inter-token Latency) per request
+  - Time series: E2E latency per request
+  - Request throughput over time
+  - Resource utilization over time (GPU memory, utilization)
+- Interactive controls:
+  - Zoom into time ranges
+  - Filter by request characteristics (input length, output length)
+  - Toggle between metrics
+  - Overlay multiple metrics
+
+**Static Mode (`--format`):**
+- Generates plot files in `{sweep_dir}/visualizations/` or `{run_dir}/visualizations/`
+- PNG: Individual plot images
+- HTML: Self-contained interactive report
+
+## Configuration
+
+Users can configure default plots for both interactive mode and static mode in `~/.aiperf/config.yaml`:
+```yaml
+visualization:
+  default_plots:
+    - pareto_curve
+    - throughput_vs_concurrency
+    - latency_distribution
+    - resource_utilization
+  
+  custom_plots:
+    - name: "Memory Efficiency"
+      type: "line"
+      x_axis: "concurrency"
+      y_axis: "gpu_memory_used"
+```
+
+If no config exists, AIPerf provides some pre-determinted defaults:
+- Pareto curve (throughput vs latency)
+- Throughput vs swept parameter
+- Latency distribution
+
+## Sweep Configuration File
+```yaml
+# sweep.yaml
+model: Qwen3-0.6B
+
+sweep:
+  concurrency: [1, 4, 8, 16, 32]
+  output_seq_len: [128, 256, 512]
+
+fixed:
+  input_seq_len: 512
+```
+
+## Visualization Technology: Plotly Dash
+
+The plot command is planned to use **Plotly Dash**, a Python framework for building interactive web applications.
+
+**Key Benefits:**
+- **Pure Python** - No JavaScript required, consistent with AIPerf's codebase
+- **Interactive by design** - Users adjust settings (axes, filters, run selection) in browser without re-running commands
+- **Self-contained** - Runs locally with no external services required
+- **Customizable** - Full control over AIPerf-specific visualizations and styling
+
+**Proposed Flow:**
+```
+aiperf plot ./my_sweep --host
+    ↓
+Load sweep data → Create Dash app → Launch server at localhost:8080
+    ↓
+Browser opens with:
+  - Sidebar: controls for run selection, axes, plot types
+  - Main area: interactive Plotly charts (zoom, pan, hover)
+  - Tabs: switch between visualizations (pareto, throughput, latency)
+```
+
+**Example Interactivity:**
+```python
+# When user changes dropdown, plot updates instantly via callback
+@app.callback(Output('plot', 'figure'), Input('x-axis', 'value'))
+def update_plot(x_axis):
+    return generate_plot(x_axis=x_axis)
+```
+
+This approach should provide the customization needed for AIPerf's use cases. Alternative frameworks (Streamlit, React+Flask) could be considered during implementation if requirements change.
+
+# Alternate Solutions
+
+## Alt 1: Single Command (Sweep + Auto-Visualize)
+
+Automatically generate visualizations after sweep completes.
+
+**Pros:**
+- One command for everything
+- Immediate results
+
+**Cons:**
+- Sweeps can take hours; users may want to visualize later
+- Cannot re-visualize with different settings without re-running sweep
+- Not flexible for CI/CD
+
+**Reason Rejected:**
+Separate commands provide better control and avoid re-running expensive profiling operations when users only want different visualizations or running visualizations when users only want raw profiling exports.
+
+## Alt 2: External Tools (TensorBoard/WanDB)
+
+Use existing visualization platforms.
+
+**Pros:**
+- No maintenance burden for UI/UX integrations. 
+- Feature-rich
+
+**Cons:**
+- WanDB needs login and WiFi access
+- Tensorboard is more specific to ML applications 
+- Less control over AIPerf-specific visualizations
+