From 3f54e7815198c6cc77551da509d6a27263d0fa23 Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Sat, 18 Oct 2025 15:35:01 -0700 Subject: [PATCH 01/10] init --- docs/.config/mkdocs-gh-pages.yml | 2 +- docs/.config/mkdocs.yml | 2 +- docs/api/index.md | 549 ++++++++++++++++-- .../{command-guidelines.md => entrypoints.md} | 6 +- docs/getting-started/index.md | 2 +- docs/getting-started/installation.md | 2 +- docs/index.md | 2 +- docs/user-guide/index.md | 2 +- docs/user-guide/run-benchmark.md | 36 +- docs/user-guide/scenario-definition.md | 1 + 10 files changed, 534 insertions(+), 70 deletions(-) rename docs/getting-started/{command-guidelines.md => entrypoints.md} (70%) diff --git a/docs/.config/mkdocs-gh-pages.yml b/docs/.config/mkdocs-gh-pages.yml index b5fcb9b8..8ab4f093 100644 --- a/docs/.config/mkdocs-gh-pages.yml +++ b/docs/.config/mkdocs-gh-pages.yml @@ -116,7 +116,7 @@ nav: - getting-started/index.md - Installation: getting-started/installation.md - Task Definition: getting-started/task-definition.md - - Command Guidelines: getting-started/command-guidelines.md + - Entrypoints: getting-started/entrypoints.md - Metrics Definition: getting-started/metrics-definition.md - User Guide: - user-guide/index.md diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index 4e1c4e52..82b60d84 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -116,7 +116,7 @@ nav: - getting-started/index.md - Installation: getting-started/installation.md - Task Definition: getting-started/task-definition.md - - Command Guidelines: getting-started/command-guidelines.md + - Entrypoints: getting-started/entrypoints.md - Metrics Definition: getting-started/metrics-definition.md - User Guide: - user-guide/index.md diff --git a/docs/api/index.md b/docs/api/index.md index 51bcaecd..e9bae104 100644 --- a/docs/api/index.md +++ b/docs/api/index.md @@ -1,84 +1,488 @@ # API Reference -This section provides detailed API documentation for GenAI Bench components. +This section provides comprehensive API documentation for GenAI Bench components, including CLI commands, core classes, and usage examples. -!!! info "Coming Soon" - Comprehensive API documentation is being developed. In the meantime, please refer to the source code docstrings. +## CLI Commands -## Core Components +### `genai-bench benchmark` -### Authentication +The main command for running benchmarks against LLM endpoints. -- **UnifiedAuthFactory** - Factory for creating authentication providers -- **ModelAuthProvider** - Base class for model authentication -- **StorageAuthProvider** - Base class for storage authentication +```bash +genai-bench benchmark [OPTIONS] +``` + +**Key Options:** + +- `--api-backend` - API backend (openai, aws-bedrock, azure-openai, gcp-vertex, oci-cohere, oci-genai) +- `--api-key` - API key for authentication +- `--model` - Model name to benchmark +- `--task` - Task type (text-to-text, text-to-embeddings, image-text-to-text, etc.) +- `--traffic-scenario` - Traffic scenario specification +- `--num-concurrency` - Number of concurrent requests +- `--max-time-per-run` - Maximum time per run in seconds +- `--upload-results` - Upload results to cloud storage -### Storage +**Example:** +```bash +genai-bench benchmark \ + --api-backend openai \ + --api-key $OPENAI_KEY \ + --model gpt-4 \ + --task text-to-text \ + --traffic-scenario "N(100,50)" \ + --num-concurrency 1,2,4,8 \ + --max-time-per-run 300 +``` -- **BaseStorage** - Abstract base class for storage implementations -- **StorageFactory** - Factory for creating storage providers +### `genai-bench excel` -### CLI +Generate Excel reports from benchmark results. -- **option_groups** - Modular CLI option definitions -- **validation** - Input validation functions +```bash +genai-bench excel [OPTIONS] +``` -### Metrics +**Options:** -- **AggregatedMetricsCollector** - Collects and aggregates benchmark metrics -- **RequestMetricsCollector** - Collects per-request metrics +- `--experiment-folder` - Path to experiment folder +- `--excel-name` - Name of the Excel file +- `--metric-percentile` - Percentile for metrics (mean, p25, p50, p75, p90, p95, p99) +- `--metrics-time-unit` - Time unit (s, ms) -### User Classes +### `genai-bench plot` -- **BaseUser** - Abstract base class for user implementations -- **OpenAIUser** - OpenAI API implementation -- **AWSBedrockUser** - AWS Bedrock implementation -- **AzureOpenAIUser** - Azure OpenAI implementation -- **GCPVertexUser** - GCP Vertex AI implementation -- **OCICohereUser** - OCI Cohere implementation +Generate plots from benchmark results with flexible configuration. -## Example Usage +```bash +genai-bench plot [OPTIONS] +``` -### Creating an Authentication Provider +**Options:** + +- `--experiments-folder` - Path to experiments folder +- `--group-key` - Key to group data by +- `--plot-config` - Path to JSON plot configuration +- `--preset` - Built-in plot preset +- `--filter-criteria` - Filter criteria for data + +## Core Protocol Classes + +### Request Classes + +#### `UserRequest` +Base class for all user requests. ```python -from genai_bench.auth.unified_factory import UnifiedAuthFactory +class UserRequest(BaseModel): + model: str + additional_request_params: Dict[str, Any] = Field(default_factory=dict) +``` -# Create OpenAI auth -auth = UnifiedAuthFactory.create_model_auth( - "openai", - api_key="sk-..." -) +#### `UserChatRequest` +For text-to-text tasks. -# Create AWS Bedrock auth -auth = UnifiedAuthFactory.create_model_auth( - "aws-bedrock", - access_key_id="AKIA...", - secret_access_key="...", - region="us-east-1" -) +```python +class UserChatRequest(UserRequest): + prompt: str + num_prefill_tokens: int | None + max_tokens: int | None +``` + +#### `UserEmbeddingRequest` +For text-to-embeddings tasks. + +```python +class UserEmbeddingRequest(UserRequest): + documents: List[str] + num_prefill_tokens: Optional[int] +``` + +#### `UserImageChatRequest` +For image-text-to-text tasks. + +```python +class UserImageChatRequest(UserChatRequest): + image_content: List[str] + num_images: int +``` + +### Response Classes + +#### `UserResponse` +Base class for all user responses. + +```python +class UserResponse(BaseModel): + status_code: int + time_at_first_token: Optional[float] + start_time: Optional[float] + end_time: Optional[float] + error_message: Optional[str] + num_prefill_tokens: Optional[int] +``` + +#### `UserChatResponse` +For chat task responses. + +```python +class UserChatResponse(UserResponse): + generated_text: Optional[str] + tokens_received: Optional[int] +``` + +### Experiment Metadata + +#### `ExperimentMetadata` +Contains all metadata for an experiment. + +```python +class ExperimentMetadata(BaseModel): + cmd: str + benchmark_version: str + api_backend: str + model: str + task: str + num_concurrency: List[int] + traffic_scenario: List[str] + max_time_per_run_s: int + max_requests_per_run: int + # ... and more fields +``` + +## Scenario Classes + +### `Scenario` +Abstract base class for traffic scenarios. + +```python +class Scenario(ABC): + scenario_type: TextDistribution | MultiModality | EmbeddingDistribution | ReRankDistribution | SpecialScenario + validation_pattern: str + + @abstractmethod + def sample(self) -> Any: ... + + @abstractmethod + def to_string(self) -> str: ... + + @classmethod + @abstractmethod + def parse(cls, params_str: str) -> "Scenario": ... +``` + +### Distribution Types + +#### `TextDistribution` +```python +class TextDistribution(Enum): + NORMAL = "N" + DETERMINISTIC = "D" + UNIFORM = "U" +``` + +#### `NormalDistribution` +Normal distribution scenario for text tasks. + +```python +class NormalDistribution(Scenario): + scenario_type = TextDistribution.NORMAL + validation_pattern = r"^N\(\d+,\d+\)$" + + def __init__(self, mean: int, std: int): ... +``` + +#### `EmbeddingScenario` +Scenario for embedding tasks. + +```python +class EmbeddingScenario(Scenario): + scenario_type = EmbeddingDistribution.EMBEDDING + validation_pattern = r"^E\(\d+\)$" + + def __init__(self, tokens_per_document: int): ... +``` + +## Sampler Classes + +### `Sampler` +Abstract base class for data samplers. + +```python +class Sampler(ABC): + modality_registry: Dict[str, Type["Sampler"]] = {} + input_modality: str + supported_tasks: Set[str] + + def __init__(self, tokenizer, model: str, output_modality: str, ...): ... + + @abstractmethod + def sample(self, scenario: Scenario) -> UserRequest: ... + + @classmethod + def create(cls, task: str, *args, **kwargs) -> "Sampler": ... +``` + +### `TextSampler` +For text-based tasks. + +```python +class TextSampler(Sampler): + input_modality = "text" + supported_tasks = {"text-to-text", "text-to-embeddings", "text-to-rerank"} + + def __init__(self, tokenizer, model: str, output_modality: str, data: List[str], ...): ... +``` + +### `ImageSampler` +For image-based tasks. + +```python +class ImageSampler(Sampler): + input_modality = "image" + supported_tasks = {"image-text-to-text", "image-to-embeddings"} + + def __init__(self, tokenizer, model: str, output_modality: str, data: Any, ...): ... +``` + +## Data Loading Classes + +### `DatasetLoader` +Abstract base class for dataset loaders. + +```python +class DatasetLoader(ABC): + supported_formats: Set[DatasetFormat] = set() + media_type: str = "" + + def __init__(self, dataset_config: DatasetConfig): ... + + def load_request(self) -> Union[List[str], List[Tuple[str, Any]]]: ... +``` + +### `TextDatasetLoader` +For loading text datasets. + +```python +class TextDatasetLoader(DatasetLoader): + supported_formats = {DatasetFormat.TEXT, DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} + media_type = "text" +``` + +### `ImageDatasetLoader` +For loading image datasets. + +```python +class ImageDatasetLoader(DatasetLoader): + supported_formats = {DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} + media_type = "image" +``` + +## Authentication Classes + +### `UnifiedAuthFactory` +Factory for creating authentication providers. + +```python +class UnifiedAuthFactory: + @staticmethod + def create_model_auth(provider: str, **kwargs) -> ModelAuthProvider: ... + + @staticmethod + def create_storage_auth(provider: str, **kwargs) -> StorageAuthProvider: ... +``` + +### `ModelAuthProvider` +Base class for model authentication. + +```python +class ModelAuthProvider(ABC): + @abstractmethod + def get_auth_headers(self) -> Dict[str, str]: ... + + @abstractmethod + def get_auth_params(self) -> Dict[str, Any]: ... +``` + +## Storage Classes + +### `BaseStorage` +Abstract base class for storage implementations. + +```python +class BaseStorage(ABC): + @abstractmethod + def upload_file(self, file_path: str, bucket: str, key: str) -> None: ... + + @abstractmethod + def upload_folder(self, folder_path: str, bucket: str, prefix: str = "") -> None: ... +``` + +### `StorageFactory` +Factory for creating storage providers. + +```python +class StorageFactory: + @staticmethod + def create_storage(provider: str, auth: StorageAuthProvider) -> BaseStorage: ... +``` + +## User Classes + +### `BaseUser` +Abstract base class for user implementations. + +```python +class BaseUser(HttpUser): + supported_tasks: Dict[str, str] = {} + + @classmethod + def is_task_supported(cls, task: str) -> bool: ... + + def sample(self) -> UserRequest: ... + + def collect_metrics(self, user_response: UserResponse, endpoint: str): ... +``` + +### Provider-Specific User Classes + +- `OpenAIUser` - OpenAI API implementation +- `AWSBedrockUser` - AWS Bedrock implementation +- `AzureOpenAIUser` - Azure OpenAI implementation +- `GCPVertexUser` - GCP Vertex AI implementation +- `OCICohereUser` - OCI Cohere implementation +- `OCIGenAIUser` - OCI GenAI implementation + +## Metrics Classes + +### `RequestLevelMetrics` +Metrics for individual requests. + +```python +class RequestLevelMetrics(BaseModel): + ttft: Optional[float] = None # Time to first token + tpot: Optional[float] = None # Time per output token + e2e_latency: Optional[float] = None # End-to-end latency + output_latency: Optional[float] = None # Output latency + output_inference_speed: Optional[float] = None # Output inference speed + num_input_tokens: Optional[int] = None + num_output_tokens: Optional[int] = None + total_tokens: Optional[int] = None + input_throughput: Optional[float] = None + output_throughput: Optional[float] = None + error_code: Optional[int] = None + error_message: Optional[str] = None ``` -### Creating a Storage Provider +### `AggregatedMetrics` +Aggregated metrics across multiple requests. + +```python +class AggregatedMetrics(BaseModel): + # Contains aggregated statistics for all metrics + # including percentiles, means, etc. +``` + +## Analysis and Reporting Classes + +### `FlexiblePlotGenerator` +Generates plots using flexible configuration. + +```python +class FlexiblePlotGenerator: + def __init__(self, config: PlotConfig): ... + + def generate_plots( + self, + run_data_list: List[Tuple[ExperimentMetadata, ExperimentMetrics]], + group_key: str, + experiment_folder: str, + metrics_time_unit: str = "s" + ) -> None: ... +``` + +### `PlotConfig` +Configuration for plot generation. + +```python +class PlotConfig(BaseModel): + title: str + plots: List[PlotSpec] + figure_size: Tuple[int, int] = (12, 8) + dpi: int = 100 + # ... more configuration options +``` + +### `ExperimentLoader` +Loads experiment data from files. + +```python +def load_multiple_experiments( + folder_name: str, + filter_criteria=None +) -> List[Tuple[ExperimentMetadata, ExperimentMetrics]]: ... + +def load_one_experiment( + folder_name: str, + filter_criteria: Optional[Dict[str, Any]] = None +) -> Tuple[Optional[ExperimentMetadata], ExperimentMetrics]: ... +``` + +## Configuration Classes + +### `DatasetConfig` +Configuration for dataset loading. + +```python +class DatasetConfig(BaseModel): + source: DatasetSourceConfig + prompt_column: Optional[str] = None + image_column: Optional[str] = None + unsafe_allow_large_images: bool = False +``` + +### `DatasetSourceConfig` +Configuration for dataset sources. + +```python +class DatasetSourceConfig(BaseModel): + type: Literal["file", "huggingface", "custom"] + path: Optional[str] = None + file_format: Optional[str] = None + huggingface_dataset: Optional[str] = None + huggingface_config: Optional[str] = None + huggingface_split: Optional[str] = None + loader_class: Optional[str] = None + loader_kwargs: Optional[Dict[str, Any]] = None +``` + +## Usage Examples + +### Basic Benchmarking ```python from genai_bench.auth.unified_factory import UnifiedAuthFactory from genai_bench.storage.factory import StorageFactory +from genai_bench.scenarios.base import Scenario -# Create storage auth +# Create authentication +auth = UnifiedAuthFactory.create_model_auth( + "openai", + api_key="sk-..." +) + +# Create storage storage_auth = UnifiedAuthFactory.create_storage_auth( "aws", profile="default", region="us-east-1" ) +storage = StorageFactory.create_storage("aws", storage_auth) -# Create storage instance -storage = StorageFactory.create_storage( - "aws", - storage_auth -) +# Create scenario +scenario = Scenario.from_string("N(100,50)") -# Upload a folder +# Upload results storage.upload_folder( "/path/to/results", "my-bucket", @@ -86,6 +490,57 @@ storage.upload_folder( ) ``` +### Custom Plot Generation + +```python +from genai_bench.analysis.flexible_plot_report import FlexiblePlotGenerator +from genai_bench.analysis.plot_config import PlotConfig, PlotSpec + +# Create plot configuration +config = PlotConfig( + title="Performance Analysis", + plots=[ + PlotSpec( + x_field="concurrency", + y_fields=["e2e_latency"], + plot_type="line", + title="Latency vs Concurrency" + ) + ] +) + +# Generate plots +generator = FlexiblePlotGenerator(config) +generator.generate_plots( + run_data_list, + group_key="traffic_scenario", + experiment_folder="/path/to/results" +) +``` + +### Custom Dataset Loading + +```python +from genai_bench.data.config import DatasetConfig, DatasetSourceConfig +from genai_bench.data.loaders.factory import DataLoaderFactory + +# Configure dataset +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/dataset.csv", + file_format="csv" + ), + prompt_column="text" +) + +# Load data +data = DataLoaderFactory.load_data_for_task( + "text-to-text", + dataset_config +) +``` + ## Contributing to API Documentation We welcome contributions to improve our API documentation! If you'd like to help: diff --git a/docs/getting-started/command-guidelines.md b/docs/getting-started/entrypoints.md similarity index 70% rename from docs/getting-started/command-guidelines.md rename to docs/getting-started/entrypoints.md index fd0228dd..71b8ae44 100644 --- a/docs/getting-started/command-guidelines.md +++ b/docs/getting-started/entrypoints.md @@ -1,4 +1,4 @@ -# Command Guidelines +# Entrypoints Once you install it in your local environment, you can use `--help` to read about what command options it supports. @@ -7,7 +7,7 @@ about what command options it supports. genai-bench --help ``` -`genai-bench` supports three commands: +`genai-bench` has three CLI entrypoints: ```shell Commands: @@ -16,4 +16,6 @@ Commands: plot Plots the experiment(s) results based on filters and group... ``` +For further information on how to use GenAI Bench, you can refer to the [User Guide](../user-guide/index.md) and the [API Reference](../api/index.md). + You can also refer to [option_groups.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/option_groups.py). \ No newline at end of file diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index a6d91788..32a46ee6 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -32,7 +32,7 @@ GenAI Bench is a powerful benchmark tool designed for comprehensive token-level Master the command-line interface - [:octicons-arrow-right-24: Command Guidelines](command-guidelines.md) + [:octicons-arrow-right-24: Entrypoints](entrypoints.md) - :material-chart-line:{ .lg .middle } **Metrics** diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 9e9a3182..99665f44 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -155,4 +155,4 @@ After successful installation: 1. Read the [Task Definition Guide](task-definition.md) to understand different benchmark tasks 2. Explore the [User Guide](../user-guide/run-benchmark.md) for detailed usage -3. Check out [Command Guidelines](command-guidelines.md) for practical scenarios \ No newline at end of file +3. Check out [Entrypoints](entrypoints.md) for practical scenarios \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index c0d6dfb9..3297e4f1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -62,7 +62,7 @@ GenAI Bench supports multiple benchmark types: - [Installation](getting-started/installation.md) - Detailed installation guide - [Task Definition](getting-started/task-definition.md) - Understanding different benchmark tasks -- [Command Guidelines](getting-started/command-guidelines.md) - Command usage guidelines +- [Entrypoints](getting-started/entrypoints.md) - Command usage guidelines - [Metrics Definition](getting-started/metrics-definition.md) - Understanding benchmark metrics ### πŸ“– User Guide diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md index f099a32f..ed343cc3 100644 --- a/docs/user-guide/index.md +++ b/docs/user-guide/index.md @@ -75,5 +75,5 @@ Support for text, embeddings, and vision tasks: ## Need Help? - Check the [Quick Reference](multi-cloud-quick-reference.md) for common commands -- Review [Command Guidelines](../getting-started/command-guidelines.md) for detailed options +- Review [Entrypoints](../getting-started/entrypoints.md) for detailed options - See [Troubleshooting](multi-cloud-auth-storage.md#troubleshooting) for common issues \ No newline at end of file diff --git a/docs/user-guide/run-benchmark.md b/docs/user-guide/run-benchmark.md index 63f09c24..926f1881 100644 --- a/docs/user-guide/run-benchmark.md +++ b/docs/user-guide/run-benchmark.md @@ -131,16 +131,22 @@ genai-bench benchmark --api-backend oci-cohere \ --num-workers 4 ``` -## Monitor a benchmark +## Specify a custom benchmark load **IMPORTANT**: logs in genai-bench are all useful. Please keep an eye on WARNING logs when you finish one benchmark. -### Specify --traffic-scenario and --num-concurrency +You can specify a custom load to benchmark through setting traffic scenarios and concurrencies to benchmark at. + +Traffic scenarios let you define the shape of requests when benchmarking. See [Traffic Scenarios](./scenario-definition.md) for more information. + +The concurrency is the number of concurrent users making requests. Running various concurrencies allows you to benchmark performance at different loads. Each specified scenario is run at each concurrency. Specify concurrencies to run with `--num-concurrency`. **IMPORTANT**: Please use `genai-bench benchmark --help` to check out the latest default value of `--num-concurrency` and `--traffic-scenario`. -Both options are defined as [multi-value options](https://click.palletsprojects.com/en/8.1.x/options/#multi-value-options) in click. Meaning you can pass this command multiple times. If you want to define your own `--num-concurrency` or `--traffic-scenario`, you can use +Both options are defined as [multi-value options](https://click.palletsprojects.com/en/8.1.x/options/#multi-value-options) in click. Meaning you can pass this command multiple times. + +For example, the below benchmark command runs a scenario with a normal distribution of input and output tokens (Input mean=480, st.dev=240), (Output mean=300, st.dev=150) at concurrencies 1, 2, 4, 8, 16 and 32. ```shell genai-bench benchmark \ @@ -153,9 +159,9 @@ genai-bench benchmark \ --traffic-scenario "N(480,240)/(300,150)" --traffic-scenario "D(100,100)" ``` -### Notes on specific options +### Notes on benchmark duration -To manage each run or iteration in an experiment, genai-bench uses two parameters to control the exit logic. You can find more details in the `manage_run_time` function located in [utils.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/utils.py). Combination of `--max-time-per-run` and `--max-requests-per-run` should save overall time of one benchmark. +To manage each run or iteration in an experiment, genai-bench uses two parameters to control the exit logic. Benchmark runs terminate after exceeding either the maximum time limit the maximum number of requests. These are specified with `--max-time-per-run` and `--max-requests-per-run`. You can find more details in the `manage_run_time` function located in [utils.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/utils.py). For light traffic scenarios, such as D(7800,200) or lighter, we recommend the following settings: @@ -197,9 +203,17 @@ To address this, you can increase the number of worker processes using the `--nu This distributes the load across multiple processes on a single machine, improving performance and ensuring your benchmark runs smoothly. +### Notes on Usage + +1. This feature is experimental, so monitor the system's behavior when enabling multiple workers. +2. Recommended Limit: Do **not** set the number of workers to more than 16, as excessive worker processes can lead to resource contention and diminished performance. +3. Ensure your system has sufficient CPU and memory resources to support the desired number of workers. +4. Adjust the number of workers based on your target load and system capacity to achieve optimal results. +5. For high-concurrency tests with large payloads, use `--spawn-rate` to prevent worker overload. + ### Controlling User Spawn Rate -When running high-concurrency benchmarks with large payloads (e.g., 20k+ tokens), workers may become overwhelmed if all users are spawned immediately. This can cause worker heartbeat failures and restarts. +By default, users are spawned at a rate equal to the concurrency, meaning it takes one second for all users to be created. When running high-concurrency benchmarks with large payloads (e.g., 20k+ tokens), workers may become overwhelmed if all users are spawned immediately. This can cause worker heartbeat failures and restarts. To prevent this, use the `--spawn-rate` option to control how quickly users are spawned: @@ -215,14 +229,6 @@ To prevent this, use the `--spawn-rate` option to control how quickly users are - `--spawn-rate 100`: Spawn 100 users per second (takes 5 seconds to reach 500 users) - `--spawn-rate 500`: Spawn all users immediately (default behavior) -### Notes on Usage - -1. This feature is experimental, so monitor the system's behavior when enabling multiple workers. -2. Recommended Limit: Do **not** set the number of workers to more than 16, as excessive worker processes can lead to resource contention and diminished performance. -3. Ensure your system has sufficient CPU and memory resources to support the desired number of workers. -4. Adjust the number of workers based on your target load and system capacity to achieve optimal results. -5. For high-concurrency tests with large payloads, use `--spawn-rate` to prevent worker overload. - ## Using Dataset Configurations Genai-bench supports flexible dataset configurations through two approaches: @@ -345,4 +351,4 @@ If you want to benchmark a specific portion of a vision dataset, you can use the ## Picking units -Genai-bench defaults to measuring latency (End-to-end latency, TTFT, TPOT, Input/Output latencies) in seconds. If you prefer milliseconds, you can select them with `--metrics-time-unit [s|ms]`. \ No newline at end of file +Genai-bench defaults to measuring latency metrics (End-to-end latency, TTFT, TPOT, Input/Output latencies) in seconds. If you prefer milliseconds, you can select them with `--metrics-time-unit [s|ms]`. \ No newline at end of file diff --git a/docs/user-guide/scenario-definition.md b/docs/user-guide/scenario-definition.md index 1b525099..a0d70f6b 100644 --- a/docs/user-guide/scenario-definition.md +++ b/docs/user-guide/scenario-definition.md @@ -12,6 +12,7 @@ Scenarios are optional. If you don’t provide any and you supply a dataset, gen - The CLI accepts one or more scenarios via `--traffic-scenario`. Each run iterates over the supplied scenarios and the selected iteration parameter (concurrency or batch size). - Internally, each scenario string is parsed into a Scenario class and passed to samplers to control request construction. +- Scenarios are defined as [multi-value options](https://click.palletsprojects.com/en/8.1.x/options/#multi-value-options) in click. Meaning you can pass this command multiple times to benchmark different loads. ### Scenario types and formats From 66b4a11bcf2ea714daf252b1f28c1d590c8d0163 Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Sat, 18 Oct 2025 15:49:54 -0700 Subject: [PATCH 02/10] continue_tidy_up --- docs/.config/mkdocs-gh-pages.yml | 2 +- docs/.config/mkdocs.yml | 2 +- docs/user-guide/index.md | 2 -- docs/user-guide/multi-cloud-auth-storage.md | 2 +- docs/user-guide/run-benchmark.md | 6 +++--- 5 files changed, 6 insertions(+), 8 deletions(-) diff --git a/docs/.config/mkdocs-gh-pages.yml b/docs/.config/mkdocs-gh-pages.yml index 8ab4f093..1547b378 100644 --- a/docs/.config/mkdocs-gh-pages.yml +++ b/docs/.config/mkdocs-gh-pages.yml @@ -123,7 +123,7 @@ nav: - Run Benchmark: user-guide/run-benchmark.md - Traffic Scenarios: user-guide/scenario-definition.md - Multi-Cloud Authentication: user-guide/multi-cloud-auth-storage.md - - Quick Reference: user-guide/multi-cloud-quick-reference.md + - Multi-Cloud Quick Reference: user-guide/multi-cloud-quick-reference.md - Docker Deployment: user-guide/run-benchmark-using-docker.md - Excel Reports: user-guide/generate-excel-sheet.md - Visualizations: user-guide/generate-plot.md diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index 82b60d84..01ea60a9 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -123,7 +123,7 @@ nav: - Run Benchmark: user-guide/run-benchmark.md - Traffic Scenarios: user-guide/scenario-definition.md - Multi-Cloud Authentication: user-guide/multi-cloud-auth-storage.md - - Quick Reference: user-guide/multi-cloud-quick-reference.md + - Multi-Cloud Quick Reference: user-guide/multi-cloud-quick-reference.md - Docker Deployment: user-guide/run-benchmark-using-docker.md - Excel Reports: user-guide/generate-excel-sheet.md - Visualizations: user-guide/generate-plot.md diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md index ed343cc3..f6227255 100644 --- a/docs/user-guide/index.md +++ b/docs/user-guide/index.md @@ -74,6 +74,4 @@ Support for text, embeddings, and vision tasks: ## Need Help? -- Check the [Quick Reference](multi-cloud-quick-reference.md) for common commands -- Review [Entrypoints](../getting-started/entrypoints.md) for detailed options - See [Troubleshooting](multi-cloud-auth-storage.md#troubleshooting) for common issues \ No newline at end of file diff --git a/docs/user-guide/multi-cloud-auth-storage.md b/docs/user-guide/multi-cloud-auth-storage.md index f4b72922..5b4f4b8f 100644 --- a/docs/user-guide/multi-cloud-auth-storage.md +++ b/docs/user-guide/multi-cloud-auth-storage.md @@ -1,6 +1,6 @@ # Multi-Cloud Authentication and Storage Guide -genai-bench now supports comprehensive multi-cloud authentication for both model endpoints and storage services. This guide covers how to configure and use authentication for various cloud providers. +Genai-bench now supports comprehensive multi-cloud authentication for both model endpoints and storage services. This guide covers how to configure and use authentication for various cloud providers. ## Table of Contents diff --git a/docs/user-guide/run-benchmark.md b/docs/user-guide/run-benchmark.md index 926f1881..030239a7 100644 --- a/docs/user-guide/run-benchmark.md +++ b/docs/user-guide/run-benchmark.md @@ -1,6 +1,6 @@ # Run Benchmark -> **Note**: GenAI Bench now supports multiple cloud providers for both model endpoints and storage. For detailed multi-cloud configuration, see the [Multi-Cloud Authentication & Storage Guide](multi-cloud-auth-storage.md) or the [Quick Reference](multi-cloud-quick-reference.md). +> **Note**: GenAI Bench now supports multiple cloud providers for both model endpoints and storage. For detailed multi-cloud configuration, see the [Multi-Cloud Authentication & Storage Guide](multi-cloud-auth-storage.md) or the [Multi-Cloud Quick Reference](multi-cloud-quick-reference.md). ## Start a chat benchmark @@ -229,9 +229,9 @@ To prevent this, use the `--spawn-rate` option to control how quickly users are - `--spawn-rate 100`: Spawn 100 users per second (takes 5 seconds to reach 500 users) - `--spawn-rate 500`: Spawn all users immediately (default behavior) -## Using Dataset Configurations +## Selecting datasets -Genai-bench supports flexible dataset configurations through two approaches: +By default, genai-bench samples tokens to benchmark from [sonnet.txt](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/data/sonnet.txt) for `text-to-text` or `text-to-embeddings` tasks. Image tasks do not have a default dataset. To select a dataset to benchmark from, genai-bench supports flexible dataset configurations through two approaches: ### Simple CLI Usage (for basic datasets) From 295cd6669058005163d1c4db41e13e6ec222c881 Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Thu, 23 Oct 2025 09:45:04 -0700 Subject: [PATCH 03/10] rename back to command guidelines --- docs/.config/mkdocs.yml | 2 +- .../getting-started/{entrypoints.md => command-guidelines.md} | 4 ++-- docs/getting-started/index.md | 2 +- docs/getting-started/installation.md | 2 +- docs/index.md | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) rename docs/getting-started/{entrypoints.md => command-guidelines.md} (91%) diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index 01ea60a9..bd585e8c 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -116,7 +116,7 @@ nav: - getting-started/index.md - Installation: getting-started/installation.md - Task Definition: getting-started/task-definition.md - - Entrypoints: getting-started/entrypoints.md + - Command Guidelines: getting-started/command-guidelines.md - Metrics Definition: getting-started/metrics-definition.md - User Guide: - user-guide/index.md diff --git a/docs/getting-started/entrypoints.md b/docs/getting-started/command-guidelines.md similarity index 91% rename from docs/getting-started/entrypoints.md rename to docs/getting-started/command-guidelines.md index 71b8ae44..bbfd5814 100644 --- a/docs/getting-started/entrypoints.md +++ b/docs/getting-started/command-guidelines.md @@ -1,4 +1,4 @@ -# Entrypoints +# Command Guidelines Once you install it in your local environment, you can use `--help` to read about what command options it supports. @@ -7,7 +7,7 @@ about what command options it supports. genai-bench --help ``` -`genai-bench` has three CLI entrypoints: +`genai-bench` has three CLI commands: ```shell Commands: diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md index 32a46ee6..a6d91788 100644 --- a/docs/getting-started/index.md +++ b/docs/getting-started/index.md @@ -32,7 +32,7 @@ GenAI Bench is a powerful benchmark tool designed for comprehensive token-level Master the command-line interface - [:octicons-arrow-right-24: Entrypoints](entrypoints.md) + [:octicons-arrow-right-24: Command Guidelines](command-guidelines.md) - :material-chart-line:{ .lg .middle } **Metrics** diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md index 99665f44..9e9a3182 100644 --- a/docs/getting-started/installation.md +++ b/docs/getting-started/installation.md @@ -155,4 +155,4 @@ After successful installation: 1. Read the [Task Definition Guide](task-definition.md) to understand different benchmark tasks 2. Explore the [User Guide](../user-guide/run-benchmark.md) for detailed usage -3. Check out [Entrypoints](entrypoints.md) for practical scenarios \ No newline at end of file +3. Check out [Command Guidelines](command-guidelines.md) for practical scenarios \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 3297e4f1..c0d6dfb9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -62,7 +62,7 @@ GenAI Bench supports multiple benchmark types: - [Installation](getting-started/installation.md) - Detailed installation guide - [Task Definition](getting-started/task-definition.md) - Understanding different benchmark tasks -- [Entrypoints](getting-started/entrypoints.md) - Command usage guidelines +- [Command Guidelines](getting-started/command-guidelines.md) - Command usage guidelines - [Metrics Definition](getting-started/metrics-definition.md) - Understanding benchmark metrics ### πŸ“– User Guide From b77aad92159145475f472ed0d64ed758998200bc Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Thu, 23 Oct 2025 10:17:03 -0700 Subject: [PATCH 04/10] Command guidelines revamp --- docs/getting-started/command-guidelines.md | 145 +++++++++++++++++++-- 1 file changed, 136 insertions(+), 9 deletions(-) diff --git a/docs/getting-started/command-guidelines.md b/docs/getting-started/command-guidelines.md index bbfd5814..692e5b1c 100644 --- a/docs/getting-started/command-guidelines.md +++ b/docs/getting-started/command-guidelines.md @@ -1,13 +1,8 @@ # Command Guidelines -Once you install it in your local environment, you can use `--help` to read -about what command options it supports. +GenAI Bench provides three main CLI commands for running benchmarks, generating reports, and creating visualizations. This guide covers the essential options for each command. -```shell -genai-bench --help -``` - -`genai-bench` has three CLI commands: +## Overview ```shell Commands: @@ -16,6 +11,138 @@ Commands: plot Plots the experiment(s) results based on filters and group... ``` -For further information on how to use GenAI Bench, you can refer to the [User Guide](../user-guide/index.md) and the [API Reference](../api/index.md). +## Benchmark + +The `benchmark` command runs performance tests against AI models. It's the core command for executing benchmarks. + +### Essential Options + +#### **API Configuration** +- `--api-backend` - Choose your model provider (openai, oci-cohere, aws-bedrock, azure-openai, gcp-vertex, vllm, sglang) +- `--api-base` - API endpoint URL +- `--api-model-name` - Model name for the request body +- `--task` - Task type (text-to-text, text-to-embeddings, image-text-to-text, etc.) + +#### **Authentication** +- `--api-key` - API key (for OpenAI) +- `--model-api-key` - Alternative API key parameter +- Cloud-specific auth options (AWS, Azure, GCP, OCI) + +#### **Experiment Parameters** +- `--max-requests-per-run` - Maximum requests to send each run +- `--max-time-per-run` - Maximum duration for each run in minutes +- `--num-concurrency` - Number of concurrent requests to send (multiple values supported in different runs) +- `--batch-size` - Batch sizes for embeddings/rerank tasks +- `--traffic-scenario` - Define input/output token distributions, more info in [Traffic Scenarios](../user-guide/scenario-definition.md) +- `--model-tokenizer` - Path to the model tokenizer + +#### **Dataset Options** +- `--dataset-path` - Path to dataset (local file, HuggingFace ID, or 'default') +- `--dataset-config` - JSON config file for advanced dataset options, more info in [Selecting Datasets](../user-guide/run-benchmark.md/#selecting-datasets) +- `--dataset-prompt-column` - Column name for prompts +- `--dataset-image-column` - Column name for images (multimodal) + +#### **Server Information** +- `--server-engine` - Backend engine (vLLM, SGLang, TGI, etc.) +- `--server-version` - Server version +- `--server-gpu-type` - GPU type (H100, A100-80G, etc.) +- `--server-gpu-count` - Number of GPUs + +### Example Usage +```bash +# Start a chat benchmark +genai-bench benchmark --api-backend openai \ + --api-base "http://localhost:8082" \ + --api-key "your-openai-api-key" \ + --api-model-name "meta-llama/Meta-Llama-3-70B-Instruct" \ + --model-tokenizer "/mnt/data/models/Meta-Llama-3.1-70B-Instruct" \ + --task text-to-text \ + --max-time-per-run 15 \ + --max-requests-per-run 300 \ + --server-engine "SGLang" \ + --server-gpu-type "H100" \ + --server-version "v0.6.0" \ + --server-gpu-count 4 +``` +For more information and examples, check out [Run Benchmark](../user-guide/run-benchmark.md). + +## Excel + +The `excel` command exports experiment results to Excel spreadsheets for detailed analysis. + +### Essential Options + +- `--experiment-folder` - Path to experiment results folder (required) +- `--excel-name` - Name for the output Excel file (required) +- `--metric-percentile` - Statistical percentile (mean, p25, p50, p75, p90, p95, p99) to select from +- `--metrics-time-unit [s|ms]` - Time unit to use when showing latency metrics in the spreadsheet. Defaults to seconds + +### Example Usage + +```bash +# Export with mean metrics in seconds +genai-bench excel \ + --experiment-folder ./experiments/openai_gpt-3.5-turbo_20241201_120000 \ + --excel-name benchmark_results \ + --metric-percentile mean \ + --metrics-time-unit s + +# Export with 95th percentile in milliseconds +genai-bench excel \ + --experiment-folder ./experiments/my_experiment \ + --excel-name detailed_analysis \ + --metric-percentile p95 \ + --metrics-time-unit ms +``` + +## Plot + +The `plot` command generates visualizations from experiment data with flexible configuration options. + +### Essential Options + +- `--experiments-folder` - Path to experiments folder, can be more than one experiment (required) +- `--group-key` - Key to group data by (e.g., 'traffic_scenario', 'server_version', 'none') (required) +- `--filter-criteria` - Dictionary of filter criteria +- `--plot-config` - Path to JSON plot configuration file. For more information use [Advanced Plot Configuration](../user-guide/generate-plot.md/#advanced-plot-configuration) +- `--preset` - Built-in plot presets (2x4_default, simple_2x2, multi_line_latency, single_scenario_analysis). Overrides `--plot-config` if both given +- `--metrics-time-unit [s|ms]` - Time unit for latency display, defaults to seconds + +### Advanced Options + +- `--list-fields` - List available data fields and exit +- `--validate-only` - Validate configuration without generating plots +- `--verbose` - Enable detailed logging + +For more information and examples, check out [Generate Plot](../user-guide/generate-plot.md). + +### Example Usage + +```bash +# Simple plot with default 2x4 layout +genai-bench plot \ + --experiments-folder ./experiments \ + --group-key traffic_scenario \ + --filter-criteria "{'model': 'gpt-3.5-turbo'}" + +# Use built-in preset for latency analysis +genai-bench plot \ + --experiments-folder ./experiments \ + --group-key server_version \ + --preset multi_line_latency \ + --metrics-time-unit ms + +``` + +## Getting Help + +For detailed help on any command: + +```bash +genai-bench --help +genai-bench benchmark --help +genai-bench excel --help +genai-bench plot --help +``` -You can also refer to [option_groups.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/option_groups.py). \ No newline at end of file +For further information, refer to the [User Guide](../user-guide/index.md) and the [API Reference](../api/index.md). You can also look at [option_groups.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/option_groups.py) directly. \ No newline at end of file From f93e34cc4b006f98f1adb52a3dbd92cab8e1b87e Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Thu, 23 Oct 2025 14:59:34 -0700 Subject: [PATCH 05/10] Delete Examples page --- docs/.config/mkdocs.yml | 2 - docs/api/index.md | 2344 ++++++++++++++++++++++++++++++++++--- docs/development/index.md | 3 +- docs/examples/index.md | 92 -- docs/index.md | 11 +- 5 files changed, 2177 insertions(+), 275 deletions(-) delete mode 100644 docs/examples/index.md diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index bd585e8c..5a42c9b2 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -128,8 +128,6 @@ nav: - Excel Reports: user-guide/generate-excel-sheet.md - Visualizations: user-guide/generate-plot.md - Upload Results: user-guide/upload-benchmark-result.md - - Examples: - - examples/index.md - Development: - development/index.md - Contributing: development/contributing.md diff --git a/docs/api/index.md b/docs/api/index.md index e9bae104..27e688dd 100644 --- a/docs/api/index.md +++ b/docs/api/index.md @@ -2,6 +2,31 @@ This section provides comprehensive API documentation for GenAI Bench components, including CLI commands, core classes, and usage examples. +> **Quick Start**: New to GenAI Bench? Check out our [Getting Started Guide](../getting-started/index.md) for installation and basic concepts, or jump to the [User Guide](../user-guide/index.md) for practical examples. + +## Getting Started + +Before diving into the API reference, we recommend familiarizing yourself with these foundational concepts: + +- **[Installation Guide](../getting-started/installation.md)** - Set up GenAI Bench in your environment +- **[Task Definition](../getting-started/task-definition.md)** - Understand supported task types and their requirements +- **[Metrics Definition](../getting-started/metrics-definition.md)** - Learn about performance metrics and measurements +- **[Command Guidelines](../getting-started/command-guidelines.md)** - Best practices for CLI usage + +## Table of Contents + +- [CLI Commands](#cli-commands) +- [Core Protocol Classes](#core-protocol-classes) +- [Scenario System](#scenario-system) +- [Data Loading System](#data-loading-system) +- [Authentication System](#authentication-system) +- [Storage System](#storage-system) +- [UI and Dashboard System](#ui-and-dashboard-system) +- [Distributed System](#distributed-system) +- [Metrics and Analysis](#metrics-and-analysis) +- [Configuration Classes](#configuration-classes) +- [Comprehensive Examples](#comprehensive-examples) + ## CLI Commands ### `genai-bench benchmark` @@ -23,6 +48,8 @@ genai-bench benchmark [OPTIONS] - `--max-time-per-run` - Maximum time per run in seconds - `--upload-results` - Upload results to cloud storage +> **πŸ“– Learn More**: For detailed usage examples and multi-cloud configurations, see the [Run Benchmark Guide](../user-guide/run-benchmark.md) and [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). + **Example:** ```bash genai-bench benchmark \ @@ -35,6 +62,8 @@ genai-bench benchmark \ --max-time-per-run 300 ``` +> **πŸ’‘ Tip**: For traffic scenario syntax and examples, see the [Traffic Scenarios Guide](../user-guide/scenario-definition.md). + ### `genai-bench excel` Generate Excel reports from benchmark results. @@ -50,6 +79,8 @@ genai-bench excel [OPTIONS] - `--metric-percentile` - Percentile for metrics (mean, p25, p50, p75, p90, p95, p99) - `--metrics-time-unit` - Time unit (s, ms) +> **πŸ“Š Learn More**: For detailed Excel report generation examples, see the [Excel Reports Guide](../user-guide/generate-excel-sheet.md). + ### `genai-bench plot` Generate plots from benchmark results with flexible configuration. @@ -66,6 +97,8 @@ genai-bench plot [OPTIONS] - `--preset` - Built-in plot preset - `--filter-criteria` - Filter criteria for data +> **πŸ“ˆ Learn More**: For comprehensive plotting examples and configuration options, see the [Visualizations Guide](../user-guide/generate-plot.md). + ## Core Protocol Classes ### Request Classes @@ -150,7 +183,9 @@ class ExperimentMetadata(BaseModel): # ... and more fields ``` -## Scenario Classes +## Scenario System + +> **πŸ“– Learn More**: For comprehensive scenario syntax and usage examples, see the [Traffic Scenarios Guide](../user-guide/scenario-definition.md). ### `Scenario` Abstract base class for traffic scenarios. @@ -245,7 +280,9 @@ class ImageSampler(Sampler): def __init__(self, tokenizer, model: str, output_modality: str, data: Any, ...): ... ``` -## Data Loading Classes +## Data Loading System + +> **πŸ“– Learn More**: For dataset configuration examples and advanced usage, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#selecting-datasets). ### `DatasetLoader` Abstract base class for dataset loaders. @@ -278,276 +315,2231 @@ class ImageDatasetLoader(DatasetLoader): media_type = "image" ``` -## Authentication Classes +## Authentication System -### `UnifiedAuthFactory` -Factory for creating authentication providers. +> **πŸ” Learn More**: For comprehensive authentication setup and multi-cloud configurations, see the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). + +### Base Authentication Interfaces + +#### `AuthProvider` +Base class for all authentication providers. ```python -class UnifiedAuthFactory: - @staticmethod - def create_model_auth(provider: str, **kwargs) -> ModelAuthProvider: ... +class AuthProvider(ABC): + @abstractmethod + def get_config(self) -> Dict[str, Any]: ... - @staticmethod - def create_storage_auth(provider: str, **kwargs) -> StorageAuthProvider: ... + @abstractmethod + def get_credentials(self) -> Any: ... ``` -### `ModelAuthProvider` -Base class for model authentication. +#### `ModelAuthProvider` +Base class for model endpoint authentication. ```python class ModelAuthProvider(ABC): @abstractmethod - def get_auth_headers(self) -> Dict[str, str]: ... + def get_headers(self) -> Dict[str, str]: ... + + @abstractmethod + def get_config(self) -> Dict[str, Any]: ... @abstractmethod - def get_auth_params(self) -> Dict[str, Any]: ... + def get_auth_type(self) -> str: ... + + def get_credentials(self) -> Optional[Any]: ... ``` -## Storage Classes - -### `BaseStorage` -Abstract base class for storage implementations. +#### `StorageAuthProvider` +Base class for storage authentication. ```python -class BaseStorage(ABC): +class StorageAuthProvider(ABC): + @abstractmethod + def get_client_config(self) -> Dict[str, Any]: ... + @abstractmethod - def upload_file(self, file_path: str, bucket: str, key: str) -> None: ... + def get_credentials(self) -> Any: ... @abstractmethod - def upload_folder(self, folder_path: str, bucket: str, prefix: str = "") -> None: ... + def get_storage_type(self) -> str: ... + + def get_region(self) -> Optional[str]: ... ``` -### `StorageFactory` -Factory for creating storage providers. +### Authentication Factory + +#### `UnifiedAuthFactory` +Unified factory for creating model and storage authentication providers. ```python -class StorageFactory: +class UnifiedAuthFactory: @staticmethod - def create_storage(provider: str, auth: StorageAuthProvider) -> BaseStorage: ... + def create_model_auth(provider: str, **kwargs) -> ModelAuthProvider: ... + + @staticmethod + def create_storage_auth(provider: str, **kwargs) -> StorageAuthProvider: ... ``` -## User Classes +**Supported Model Providers:** +- `openai` - OpenAI API authentication +- `oci` - Oracle Cloud Infrastructure authentication +- `aws-bedrock` - AWS Bedrock authentication +- `azure-openai` - Azure OpenAI authentication +- `gcp-vertex` - Google Cloud Vertex AI authentication -### `BaseUser` -Abstract base class for user implementations. +**Supported Storage Providers:** +- `aws` - AWS S3 authentication +- `azure` - Azure Blob Storage authentication +- `gcp` - Google Cloud Storage authentication +- `oci` - Oracle Cloud Infrastructure Object Storage authentication +- `github` - GitHub repository authentication + +### Provider-Specific Authentication +#### OpenAI Authentication ```python -class BaseUser(HttpUser): - supported_tasks: Dict[str, str] = {} - - @classmethod - def is_task_supported(cls, task: str) -> bool: ... - - def sample(self) -> UserRequest: ... - - def collect_metrics(self, user_response: UserResponse, endpoint: str): ... +# OpenAI API key authentication +auth = UnifiedAuthFactory.create_model_auth( + "openai", + api_key="sk-..." +) ``` -### Provider-Specific User Classes +#### AWS Bedrock Authentication +```python +# AWS Bedrock authentication with multiple options +auth = UnifiedAuthFactory.create_model_auth( + "aws-bedrock", + access_key_id="AKIA...", + secret_access_key="...", + region="us-east-1" +) -- `OpenAIUser` - OpenAI API implementation -- `AWSBedrockUser` - AWS Bedrock implementation -- `AzureOpenAIUser` - Azure OpenAI implementation -- `GCPVertexUser` - GCP Vertex AI implementation -- `OCICohereUser` - OCI Cohere implementation -- `OCIGenAIUser` - OCI GenAI implementation +# Or using AWS profile +auth = UnifiedAuthFactory.create_model_auth( + "aws-bedrock", + profile="default", + region="us-west-2" +) +``` -## Metrics Classes +#### Azure OpenAI Authentication +```python +# Azure OpenAI authentication +auth = UnifiedAuthFactory.create_model_auth( + "azure-openai", + endpoint="https://your-resource.openai.azure.com/", + deployment="your-deployment", + api_version="2024-02-15-preview", + api_key="your-api-key" +) +``` -### `RequestLevelMetrics` -Metrics for individual requests. +#### GCP Vertex AI Authentication +```python +# GCP Vertex AI authentication +auth = UnifiedAuthFactory.create_model_auth( + "gcp-vertex", + project_id="your-project", + location="us-central1", + credentials_path="/path/to/credentials.json" +) +``` +#### OCI Authentication ```python -class RequestLevelMetrics(BaseModel): - ttft: Optional[float] = None # Time to first token - tpot: Optional[float] = None # Time per output token - e2e_latency: Optional[float] = None # End-to-end latency - output_latency: Optional[float] = None # Output latency - output_inference_speed: Optional[float] = None # Output inference speed - num_input_tokens: Optional[int] = None - num_output_tokens: Optional[int] = None - total_tokens: Optional[int] = None - input_throughput: Optional[float] = None - output_throughput: Optional[float] = None - error_code: Optional[int] = None - error_message: Optional[str] = None +# OCI authentication with multiple methods +# User Principal (default) +auth = UnifiedAuthFactory.create_model_auth( + "oci", + config_path="~/.oci/config", + profile="DEFAULT" +) + +# Instance Principal +auth = UnifiedAuthFactory.create_model_auth( + "oci", + auth_type="instance_principal" +) + +# OBO Token +auth = UnifiedAuthFactory.create_model_auth( + "oci", + auth_type="obo_token", + token="your-obo-token" +) ``` -### `AggregatedMetrics` -Aggregated metrics across multiple requests. +### Storage Authentication Examples +#### AWS S3 Storage ```python -class AggregatedMetrics(BaseModel): - # Contains aggregated statistics for all metrics - # including percentiles, means, etc. +# AWS S3 authentication +storage_auth = UnifiedAuthFactory.create_storage_auth( + "aws", + access_key_id="AKIA...", + secret_access_key="...", + region="us-east-1" +) ``` -## Analysis and Reporting Classes +#### Azure Blob Storage +```python +# Azure Blob Storage authentication +storage_auth = UnifiedAuthFactory.create_storage_auth( + "azure", + account_name="your-storage-account", + account_key="your-account-key" +) -### `FlexiblePlotGenerator` -Generates plots using flexible configuration. +# Or using connection string +storage_auth = UnifiedAuthFactory.create_storage_auth( + "azure", + connection_string="DefaultEndpointsProtocol=https;AccountName=..." +) +``` +#### Google Cloud Storage ```python -class FlexiblePlotGenerator: - def __init__(self, config: PlotConfig): ... +# GCP Cloud Storage authentication +storage_auth = UnifiedAuthFactory.create_storage_auth( + "gcp", + project_id="your-project", + credentials_path="/path/to/credentials.json" +) +``` + +#### OCI Object Storage +```python +# OCI Object Storage authentication +storage_auth = UnifiedAuthFactory.create_storage_auth( + "oci", + config_path="~/.oci/config", + profile="DEFAULT" +) +``` + +#### GitHub Storage +```python +# GitHub repository authentication +storage_auth = UnifiedAuthFactory.create_storage_auth( + "github", + token="ghp_...", + owner="your-username", + repo="your-repo" +) +``` + +## Storage System + +> **πŸ’Ύ Learn More**: For storage configuration and upload examples, see the [Upload Results Guide](../user-guide/upload-benchmark-result.md) and [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). + +### Base Storage Interface + +#### `BaseStorage` +Abstract base class for all storage implementations. + +```python +class BaseStorage(ABC): + @abstractmethod + def upload_file( + self, local_path: Union[str, Path], remote_path: str, bucket: str, **kwargs + ) -> None: ... - def generate_plots( - self, - run_data_list: List[Tuple[ExperimentMetadata, ExperimentMetrics]], - group_key: str, - experiment_folder: str, - metrics_time_unit: str = "s" + @abstractmethod + def upload_folder( + self, local_folder: Union[str, Path], bucket: str, prefix: str = "", **kwargs + ) -> None: ... + + @abstractmethod + def download_file( + self, remote_path: str, local_path: Union[str, Path], bucket: str, **kwargs ) -> None: ... + + @abstractmethod + def list_objects( + self, bucket: str, prefix: Optional[str] = None, **kwargs + ) -> Generator[str, None, None]: ... + + @abstractmethod + def delete_object(self, remote_path: str, bucket: str, **kwargs) -> None: ... + + @abstractmethod + def get_storage_type(self) -> str: ... ``` -### `PlotConfig` -Configuration for plot generation. +### Storage Factory + +#### `StorageFactory` +Factory for creating storage provider instances. ```python -class PlotConfig(BaseModel): - title: str - plots: List[PlotSpec] - figure_size: Tuple[int, int] = (12, 8) - dpi: int = 100 - # ... more configuration options +class StorageFactory: + @staticmethod + def create_storage( + provider: str, auth: StorageAuthProvider, **kwargs + ) -> BaseStorage: ... ``` -### `ExperimentLoader` -Loads experiment data from files. +**Supported Storage Providers:** +- `aws` - AWS S3 storage +- `azure` - Azure Blob Storage +- `gcp` - Google Cloud Storage +- `oci` - Oracle Cloud Infrastructure Object Storage +- `github` - GitHub repository storage +### Storage Provider Implementations + +#### AWS S3 Storage ```python -def load_multiple_experiments( - folder_name: str, - filter_criteria=None -) -> List[Tuple[ExperimentMetadata, ExperimentMetrics]]: ... +class AWSS3Storage(BaseStorage): + """AWS S3 storage implementation.""" + + def __init__(self, auth: StorageAuthProvider, **kwargs): ... + + def upload_file(self, local_path, remote_path, bucket, **kwargs): ... + def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... + def download_file(self, remote_path, local_path, bucket, **kwargs): ... + def list_objects(self, bucket, prefix=None, **kwargs): ... + def delete_object(self, remote_path, bucket, **kwargs): ... + def get_storage_type(self) -> str: ... +``` -def load_one_experiment( - folder_name: str, - filter_criteria: Optional[Dict[str, Any]] = None -) -> Tuple[Optional[ExperimentMetadata], ExperimentMetrics]: ... +**Features:** +- Full S3 API support +- Automatic multipart uploads for large files +- Server-side encryption support +- Lifecycle policy management +- Cross-region replication support + +#### Azure Blob Storage +```python +class AzureBlobStorage(BaseStorage): + """Azure Blob Storage implementation.""" + + def __init__(self, auth: StorageAuthProvider, **kwargs): ... + + def upload_file(self, local_path, remote_path, bucket, **kwargs): ... + def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... + def download_file(self, remote_path, local_path, bucket, **kwargs): ... + def list_objects(self, bucket, prefix=None, **kwargs): ... + def delete_object(self, remote_path, bucket, **kwargs): ... + def get_storage_type(self) -> str: ... ``` -## Configuration Classes +**Features:** +- Blob storage with tier management +- Access control and SAS tokens +- Blob versioning support +- Soft delete capabilities +- Change feed support -### `DatasetConfig` -Configuration for dataset loading. +#### Google Cloud Storage +```python +class GCPCloudStorage(BaseStorage): + """Google Cloud Storage implementation.""" + + def __init__(self, auth: StorageAuthProvider, **kwargs): ... + + def upload_file(self, local_path, remote_path, bucket, **kwargs): ... + def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... + def download_file(self, remote_path, local_path, bucket, **kwargs): ... + def list_objects(self, bucket, prefix=None, **kwargs): ... + def delete_object(self, remote_path, bucket, **kwargs): ... + def get_storage_type(self) -> str: ... +``` + +**Features:** +- Multi-regional and regional storage classes +- Object lifecycle management +- Fine-grained access control +- Data encryption at rest and in transit +- Cloud CDN integration +#### OCI Object Storage ```python -class DatasetConfig(BaseModel): - source: DatasetSourceConfig - prompt_column: Optional[str] = None - image_column: Optional[str] = None - unsafe_allow_large_images: bool = False +class OCIObjectStorage(BaseStorage): + """Oracle Cloud Infrastructure Object Storage implementation.""" + + def __init__(self, auth: StorageAuthProvider, **kwargs): ... + + def upload_file(self, local_path, remote_path, bucket, **kwargs): ... + def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... + def download_file(self, remote_path, local_path, bucket, **kwargs): ... + def list_objects(self, bucket, prefix=None, **kwargs): ... + def delete_object(self, remote_path, bucket, **kwargs): ... + def get_storage_type(self) -> str: ... ``` -### `DatasetSourceConfig` -Configuration for dataset sources. +**Features:** +- High-performance object storage +- Automatic data replication +- Object versioning +- Cross-region backup +- Integration with OCI services +#### GitHub Storage ```python -class DatasetSourceConfig(BaseModel): - type: Literal["file", "huggingface", "custom"] - path: Optional[str] = None - file_format: Optional[str] = None - huggingface_dataset: Optional[str] = None - huggingface_config: Optional[str] = None - huggingface_split: Optional[str] = None - loader_class: Optional[str] = None - loader_kwargs: Optional[Dict[str, Any]] = None +class GitHubStorage(BaseStorage): + """GitHub repository storage implementation.""" + + def __init__(self, auth: StorageAuthProvider, **kwargs): ... + + def upload_file(self, local_path, remote_path, bucket, **kwargs): ... + def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... + def download_file(self, remote_path, local_path, bucket, **kwargs): ... + def list_objects(self, bucket, prefix=None, **kwargs): ... + def delete_object(self, remote_path, bucket, **kwargs): ... + def get_storage_type(self) -> str: ... ``` -## Usage Examples +**Features:** +- Git-based versioning +- Pull request integration +- Branch-based organization +- GitHub Actions integration +- Collaborative workflows -### Basic Benchmarking +### Storage Usage Examples +#### Basic File Operations ```python -from genai_bench.auth.unified_factory import UnifiedAuthFactory from genai_bench.storage.factory import StorageFactory -from genai_bench.scenarios.base import Scenario - -# Create authentication -auth = UnifiedAuthFactory.create_model_auth( - "openai", - api_key="sk-..." -) +from genai_bench.auth.unified_factory import UnifiedAuthFactory -# Create storage +# Create storage authentication storage_auth = UnifiedAuthFactory.create_storage_auth( "aws", - profile="default", + access_key_id="AKIA...", + secret_access_key="...", region="us-east-1" ) + +# Create storage instance storage = StorageFactory.create_storage("aws", storage_auth) -# Create scenario -scenario = Scenario.from_string("N(100,50)") +# Upload a single file +storage.upload_file( + local_path="/path/to/file.txt", + remote_path="benchmarks/2024/file.txt", + bucket="my-bucket" +) -# Upload results +# Upload entire folder storage.upload_folder( - "/path/to/results", - "my-bucket", - prefix="benchmarks/2024" + local_folder="/path/to/results", + bucket="my-bucket", + prefix="benchmarks/2024/" +) + +# Download file +storage.download_file( + remote_path="benchmarks/2024/file.txt", + local_path="/path/to/downloaded.txt", + bucket="my-bucket" ) -``` -### Custom Plot Generation +# List objects +for obj in storage.list_objects("my-bucket", prefix="benchmarks/"): + print(f"Object: {obj}") +# Delete object +storage.delete_object( + remote_path="benchmarks/2024/file.txt", + bucket="my-bucket" +) +``` + +#### Multi-Cloud Storage Setup ```python -from genai_bench.analysis.flexible_plot_report import FlexiblePlotGenerator -from genai_bench.analysis.plot_config import PlotConfig, PlotSpec +# AWS S3 +aws_storage = StorageFactory.create_storage( + "aws", + UnifiedAuthFactory.create_storage_auth("aws", profile="default") +) -# Create plot configuration -config = PlotConfig( - title="Performance Analysis", - plots=[ - PlotSpec( - x_field="concurrency", - y_fields=["e2e_latency"], - plot_type="line", - title="Latency vs Concurrency" - ) - ] +# Azure Blob Storage +azure_storage = StorageFactory.create_storage( + "azure", + UnifiedAuthFactory.create_storage_auth("azure", account_name="mystorage") ) -# Generate plots -generator = FlexiblePlotGenerator(config) -generator.generate_plots( - run_data_list, - group_key="traffic_scenario", - experiment_folder="/path/to/results" +# Google Cloud Storage +gcp_storage = StorageFactory.create_storage( + "gcp", + UnifiedAuthFactory.create_storage_auth("gcp", project_id="my-project") ) -``` -### Custom Dataset Loading +# Upload to multiple providers +for storage in [aws_storage, azure_storage, gcp_storage]: + storage.upload_folder( + local_folder="/path/to/results", + bucket="my-bucket", + prefix="backup/" + ) +``` +#### Advanced Storage Operations ```python -from genai_bench.data.config import DatasetConfig, DatasetSourceConfig -from genai_bench.data.loaders.factory import DataLoaderFactory - -# Configure dataset -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="file", - path="/path/to/dataset.csv", - file_format="csv" - ), - prompt_column="text" +# Upload with metadata +storage.upload_file( + local_path="/path/to/file.txt", + remote_path="benchmarks/2024/file.txt", + bucket="my-bucket", + metadata={"experiment": "llm-benchmark", "version": "1.0"} ) -# Load data -data = DataLoaderFactory.load_data_for_task( - "text-to-text", - dataset_config +# Upload with server-side encryption +storage.upload_file( + local_path="/path/to/file.txt", + remote_path="benchmarks/2024/file.txt", + bucket="my-bucket", + encryption="AES256" ) + +# List with filtering +for obj in storage.list_objects( + bucket="my-bucket", + prefix="benchmarks/2024/", + max_keys=100 +): + print(f"Found: {obj}") ``` -## Contributing to API Documentation +## UI and Dashboard System -We welcome contributions to improve our API documentation! If you'd like to help: +> **πŸ“Š Learn More**: For dashboard usage and configuration examples, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#distributed-benchmark). + +### Dashboard Components + +#### `Dashboard` +Union type for dashboard implementations. + +```python +Dashboard = Union[RichLiveDashboard, MinimalDashboard] +``` + +#### `RichLiveDashboard` +Real-time dashboard with rich UI components for live metrics visualization. + +```python +class RichLiveDashboard: + def __init__(self, metrics_time_unit: str = "s"): ... + + def update_metrics_panels( + self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s" + ): ... + + def update_histogram_panel( + self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s" + ): ... + + def update_scatter_plot_panel( + self, scatter_plot_metrics: Optional[List[float]], time_unit: str = "s" + ): ... + + def update_benchmark_progress_bars(self, progress_increment: float): ... + + def create_benchmark_progress_task(self, run_name: str): ... + + def update_total_progress_bars(self, total_runs: int): ... + + def start_run(self, run_time: int, start_time: float, max_requests_per_run: int): ... + + def calculate_time_based_progress(self) -> float: ... + + def handle_single_request( + self, live_metrics: LiveMetricsData, total_requests: int, error_code: int | None + ): ... + + def reset_panels(self): ... +``` + +**Features:** +- Real-time metrics visualization +- Interactive progress tracking +- Histogram and scatter plot displays +- Live updates with configurable refresh rates +- Rich console output with colors and formatting + +#### `MinimalDashboard` +Lightweight dashboard for headless or minimal UI scenarios. + +```python +class MinimalDashboard: + def __init__(self, metrics_time_unit: str = "s"): ... + + def update_metrics_panels(self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s"): ... + def update_histogram_panel(self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s"): ... + def update_scatter_plot_panel(self, scatter_plot_metrics: Optional[List[float]], time_unit: str = "s"): ... + def update_benchmark_progress_bars(self, progress_increment: float): ... + def create_benchmark_progress_task(self, run_name: str): ... + def update_total_progress_bars(self, total_runs: int): ... + def start_run(self, run_time: int, start_time: float, max_requests_per_run: int): ... + def calculate_time_based_progress(self) -> float: ... + def handle_single_request(self, live_metrics: LiveMetricsData, total_requests: int, error_code: int | None): ... + def reset_panels(self): ... +``` + +**Features:** +- No-op implementations for all dashboard methods +- Minimal resource usage +- Suitable for automated/CI environments +- Compatible with all dashboard interfaces + +### Dashboard Factory + +#### `create_dashboard` +Factory function for creating appropriate dashboard based on environment. + +```python +def create_dashboard(metrics_time_unit: str = "s") -> Dashboard: + """Factory function that returns either RichLiveDashboard or MinimalDashboard based on ENABLE_UI.""" +``` + +**Environment Variables:** +- `ENABLE_UI=true` - Creates `RichLiveDashboard` +- `ENABLE_UI=false` - Creates `MinimalDashboard` + +### Layout System + +#### `create_layout` +Creates the main dashboard layout structure. + +```python +def create_layout() -> Layout: ... +``` + +**Layout Structure:** +- **Row 1**: Total Progress and Benchmark Progress +- **Row 2**: Input and Output metrics panels +- **Row 3**: Scatter plots (TTFT vs Input Throughput, Output Latency vs Output Throughput) +- **Logs**: Log output display + +#### `create_metric_panel` +Creates individual metric panels with latency and throughput data. + +```python +def create_metric_panel( + title, latency_data, throughput_data, metrics_time_unit: str = "s" +) -> Panel: ... +``` + +#### `create_progress_bars` +Creates progress tracking bars. + +```python +def create_progress_bars() -> Tuple[Progress, Progress, int]: ... +``` + +### Plot Components + +#### `create_horizontal_colored_bar_chart` +Creates horizontal bar charts for histogram visualization. + +```python +def create_horizontal_colored_bar_chart( + data: List[float], + title: str, + max_width: int = 50 +) -> str: ... +``` + +#### `create_scatter_plot` +Creates scatter plot visualizations for correlation analysis. + +```python +def create_scatter_plot( + x_data: List[float], + y_data: List[float], + title: str +) -> str: ... +``` + +### Live Metrics Data + +#### `LiveMetricsData` +Structure for real-time metrics data. + +```python +LiveMetricsData = { + "ttft": List[float], + "input_throughput": List[float], + "output_throughput": List[float], + "output_latency": List[float], + "stats": Dict[str, Any] +} +``` + +### Dashboard Usage Examples + +#### Basic Dashboard Setup +```python +from genai_bench.ui.dashboard import create_dashboard + +# Create dashboard (automatically selects based on ENABLE_UI) +dashboard = create_dashboard(metrics_time_unit="s") + +# Use with context manager for live updates +with dashboard.live: + # Update metrics + dashboard.update_metrics_panels(live_metrics) + + # Update progress + dashboard.update_benchmark_progress_bars(0.1) + + # Update plots + dashboard.update_scatter_plot_panel(scatter_data) +``` + +#### Custom Dashboard Configuration +```python +import os + +# Force minimal dashboard +os.environ["ENABLE_UI"] = "false" +dashboard = create_dashboard() + +# Force rich dashboard +os.environ["ENABLE_UI"] = "true" +dashboard = create_dashboard() +``` + +#### Real-time Metrics Update +```python +# Live metrics data structure +live_metrics = { + "ttft": [0.1, 0.2, 0.15, 0.3], + "input_throughput": [100, 120, 110, 90], + "output_throughput": [50, 60, 55, 45], + "output_latency": [0.5, 0.6, 0.55, 0.7], + "stats": { + "mean_ttft": 0.1875, + "mean_input_throughput": 105.0, + "mean_output_throughput": 52.5, + "mean_output_latency": 0.5875 + } +} + +# Update dashboard with live data +dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") +dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") +dashboard.update_scatter_plot_panel(live_metrics["ttft"], time_unit="s") +``` + +## User Classes + +### `BaseUser` +Abstract base class for user implementations. + +```python +class BaseUser(HttpUser): + supported_tasks: Dict[str, str] = {} + + @classmethod + def is_task_supported(cls, task: str) -> bool: ... + + def sample(self) -> UserRequest: ... + + def collect_metrics(self, user_response: UserResponse, endpoint: str): ... +``` + +### Provider-Specific User Classes + +- `OpenAIUser` - OpenAI API implementation +- `AWSBedrockUser` - AWS Bedrock implementation +- `AzureOpenAIUser` - Azure OpenAI implementation +- `GCPVertexUser` - GCP Vertex AI implementation +- `OCICohereUser` - OCI Cohere implementation +- `OCIGenAIUser` - OCI GenAI implementation + +## Distributed System + +> **⚑ Learn More**: For distributed benchmarking setup and best practices, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#distributed-benchmark). + +### Distributed Configuration + +#### `DistributedConfig` +Configuration for distributed benchmark execution. + +```python +@dataclass +class DistributedConfig: + num_workers: int + master_host: str = "127.0.0.1" + master_port: int = 5557 + wait_time: int = 2 + pin_to_cores: bool = False + cpu_affinity_map: Optional[Dict[int, int]] = None +``` + +**Configuration Options:** +- `num_workers` - Number of worker processes (0 for local mode) +- `master_host` - Host for master process communication +- `master_port` - Port for master-worker communication +- `wait_time` - Wait time for worker startup +- `pin_to_cores` - Enable CPU core pinning (experimental) +- `cpu_affinity_map` - Custom worker-to-CPU mapping + +### Distributed Runner + +#### `DistributedRunner` +Manages distributed load test execution with master and worker processes. + +```python +class DistributedRunner: + def __init__( + self, + environment: Environment, + config: DistributedConfig, + dashboard: Optional[Dashboard] = None, + ): ... + + def setup(self) -> None: ... + + def update_scenario(self, scenario: str) -> None: ... + + def update_batch_size(self, batch_size: int) -> None: ... + + def cleanup(self) -> None: ... +``` + +**Architecture Overview:** + +1. **Process Model:** + - **Master Process**: Controls test execution and aggregates metrics + - **Worker Processes**: Execute actual API requests and send metrics to master + - **Local Mode**: Single process handles both execution and aggregation + +2. **Message Flow:** + - **Master β†’ Workers:** + - `"update_scenario"`: Updates test scenario configuration + - `"update_batch_size"`: Updates batch size for requests + - **Workers β†’ Master:** + - `"request_metrics"`: Sends metrics from each request for aggregation + - `"worker_log"`: Sends worker logs to master + +3. **Execution Flow:** + - **Master Process:** + - Sets up worker processes + - Controls test scenarios and batch sizes + - Aggregates metrics from workers + - Runs the main benchmark loop + - Updates dashboard with live metrics + - **Worker Processes:** + - Receive test configurations from master + - Execute API requests + - Send metrics back to master + - Do NOT execute the main benchmark loop + +4. **Message Registration:** + - **Master**: registers `"request_metrics"` handler + - **Workers**: register `"update_scenario"`, `"update_batch_size"` handlers + - **Local mode**: registers all handlers + +5. **Metrics Collection:** + - Only master/local maintains `AggregatedMetricsCollector` + - Workers collect individual request metrics and send to master + - Master aggregates metrics and updates dashboard + +### Message Handler Protocol + +#### `MessageHandler` +Protocol for message handling in distributed system. + +```python +class MessageHandler(Protocol): + def __call__(self, environment: Environment, msg: Any, **kwargs) -> None: ... +``` + +### Distributed System Usage Examples + +#### Basic Distributed Setup +```python +from genai_bench.distributed.runner import DistributedRunner, DistributedConfig +from genai_bench.ui.dashboard import create_dashboard + +# Configure distributed execution +config = DistributedConfig( + num_workers=4, + master_host="127.0.0.1", + master_port=5557, + wait_time=2 +) + +# Create dashboard +dashboard = create_dashboard() + +# Create distributed runner +runner = DistributedRunner(environment, config, dashboard) +runner.setup() + +# If worker process, exit after setup +if isinstance(environment.runner, WorkerRunner): + return + +# Master continues with test execution +runner.update_scenario("N(100,50)") +runner.update_batch_size(32) +``` + +#### Advanced Configuration +```python +# CPU-optimized distributed setup +config = DistributedConfig( + num_workers=8, + master_host="0.0.0.0", # Allow external connections + master_port=5557, + wait_time=5, + pin_to_cores=True, + cpu_affinity_map={ + 0: 0, 1: 1, 2: 2, 3: 3, # Worker -> CPU mapping + 4: 4, 5: 5, 6: 6, 7: 7 + } +) + +runner = DistributedRunner(environment, config, dashboard) +runner.setup() +``` + +#### Local vs Distributed Mode +```python +# Local mode (single process) +config = DistributedConfig(num_workers=0) +runner = DistributedRunner(environment, config, dashboard) +runner.setup() + +# Distributed mode (multiple processes) +config = DistributedConfig(num_workers=4) +runner = DistributedRunner(environment, config, dashboard) +runner.setup() +``` + +#### Dynamic Scenario Updates +```python +# Update scenario during execution +runner.update_scenario("D(200,100)") # Deterministic scenario +runner.update_scenario("N(150,75)") # Normal distribution +runner.update_scenario("U(50,250)") # Uniform distribution + +# Update batch size for embedding tasks +runner.update_batch_size(16) +runner.update_batch_size(32) +``` + +#### Cleanup and Resource Management +```python +# Automatic cleanup on exit +import atexit +atexit.register(runner.cleanup) + +# Manual cleanup +runner.cleanup() +``` + +### Performance Considerations + +#### Worker Process Optimization +- **CPU Pinning**: Pin workers to specific CPU cores for better performance +- **Process Count**: Balance between CPU cores and memory usage +- **Memory Management**: Monitor memory usage with high worker counts + +#### Network Configuration +- **Master Host**: Use `0.0.0.0` for external worker connections +- **Port Selection**: Choose non-conflicting ports for multiple instances +- **Wait Time**: Adjust based on worker startup time + +#### Resource Monitoring +```python +import psutil + +# Monitor system resources +cpu_count = multiprocessing.cpu_count() +memory_gb = psutil.virtual_memory().total / (1024**3) + +# Recommended worker count +recommended_workers = min(cpu_count * 2, 16) +``` + +## Metrics and Analysis + +> **πŸ“Š Learn More**: For metrics definitions and analysis examples, see the [Metrics Definition Guide](../getting-started/metrics-definition.md) and [Excel Reports Guide](../user-guide/generate-excel-sheet.md). + +### Metrics Collection Components + +#### `RequestLevelMetrics` +Metrics for individual requests with comprehensive tracking. + +```python +class RequestLevelMetrics(BaseModel): + ttft: Optional[float] = Field(None, description="Time to first token (TTFT)") + tpot: Optional[float] = Field(None, description="Time per output token (TPOT)") + e2e_latency: Optional[float] = Field(None, description="End-to-end latency") + output_latency: Optional[float] = Field(None, description="Output latency") + output_inference_speed: Optional[float] = Field( + None, description="Output inference speed in tokens/s" + ) + num_input_tokens: Optional[int] = Field(None, description="Number of input tokens") + num_output_tokens: Optional[int] = Field( + None, description="Number of output tokens" + ) + total_tokens: Optional[int] = Field(None, description="Total tokens processed") + input_throughput: Optional[float] = Field( + None, description="Input throughput in tokens/s" + ) + output_throughput: Optional[float] = Field( + None, description="Output throughput in tokens/s" + ) + error_code: Optional[int] = Field(None, description="Error code") + error_message: Optional[str] = Field(None, description="Error message") +``` + +#### `MetricStats` +Statistical analysis for individual metrics. + +```python +class MetricStats(BaseModel): + # Statistical measures for each metric + ttft: MetricStat = Field(default_factory=MetricStat) + tpot: MetricStat = Field(default_factory=MetricStat) + e2e_latency: MetricStat = Field(default_factory=MetricStat) + output_latency: MetricStat = Field(default_factory=MetricStat) + output_inference_speed: MetricStat = Field(default_factory=MetricStat) + num_input_tokens: MetricStat = Field(default_factory=MetricStat) + num_output_tokens: MetricStat = Field(default_factory=MetricStat) + total_tokens: MetricStat = Field(default_factory=MetricStat) + input_throughput: MetricStat = Field(default_factory=MetricStat) + output_throughput: MetricStat = Field(default_factory=MetricStat) +``` + +#### `AggregatedMetrics` +Comprehensive aggregated metrics across multiple requests. + +```python +class AggregatedMetrics(BaseModel): + # Run Metadata + scenario: Optional[str] = Field(None, description="The sample scenario") + num_concurrency: int = Field(1, description="Number of concurrency") + batch_size: int = Field(1, description="Batch size for embedding tasks") + iteration_type: str = Field( + "num_concurrency", + description="Type of iteration used (num_concurrency or batch_size)", + ) + + # Performance Metrics + run_duration: float = Field(0.0, description="Run duration in seconds.") + mean_output_throughput_tokens_per_s: float = Field(0.0, description="Mean output throughput") + mean_input_throughput_tokens_per_s: float = Field(0.0, description="Mean input throughput") + mean_total_tokens_throughput_tokens_per_s: float = Field(0.0, description="Mean total throughput") + mean_total_chars_per_hour: float = Field(0.0, description="Mean chars per hour") + requests_per_second: float = Field(0.0, description="Average requests per second") + + # Error Tracking + error_codes_frequency: Dict[int, int] = Field(default_factory=dict, description="Error code frequency") + error_rate: float = Field(0.0, description="Error rate across all requests") + num_error_requests: int = Field(0, description="Number of error requests") + num_completed_requests: int = Field(0, description="Number of completed requests") + num_requests: int = Field(0, description="Number of total requests") + + # Statistical Analysis + stats: MetricStats = Field(default_factory=MetricStats, description="Statistical analysis") +``` + +### Metrics Collectors + +#### `RequestMetricsCollector` +Collects and calculates metrics for individual requests. + +```python +class RequestMetricsCollector: + def __init__(self): ... + + def calculate_metrics(self, response: UserResponse): ... +``` + +**Features:** +- Automatic metric calculation from response data +- Error handling and validation +- Support for different response types +- Token counting and throughput calculation + +#### `AggregatedMetricsCollector` +Advanced metrics aggregation with statistical analysis. + +```python +class AggregatedMetricsCollector: + def __init__(self): ... + + def add_single_request_metrics(self, metrics: RequestLevelMetrics): ... + + def aggregate_metrics_data( + self, + start_time: float, + end_time: float, + dataset_character_to_token_ratio: float, + warmup_ratio: Optional[float], + cooldown_ratio: Optional[float], + ): ... + + def get_live_metrics_data(self) -> LiveMetricsData: ... +``` + +**Features:** +- Real-time metrics aggregation +- Statistical analysis (percentiles, means, std dev) +- Warmup and cooldown period filtering +- Live metrics data generation +- Error rate calculation + +### Time Unit Conversion + +#### `TimeUnitConverter` +Converts metrics between different time units. + +```python +class TimeUnitConverter: + @staticmethod + def convert_time_unit(value: float, from_unit: str, to_unit: str) -> float: ... + + @staticmethod + def convert_throughput_unit(value: float, from_unit: str, to_unit: str) -> float: ... +``` + +**Supported Units:** +- Time: `s` (seconds), `ms` (milliseconds), `ΞΌs` (microseconds) +- Throughput: `tokens/s`, `tokens/ms`, `tokens/ΞΌs` + +### Live Metrics System + +#### `LiveMetricsData` +Real-time metrics data structure. + +```python +LiveMetricsData = { + "ttft": List[float], + "input_throughput": List[float], + "output_throughput": List[float], + "output_latency": List[float], + "stats": Dict[str, Any] +} +``` + +### Metrics Usage Examples + +#### Basic Metrics Collection +```python +from genai_bench.metrics.request_metrics_collector import RequestMetricsCollector +from genai_bench.metrics.aggregated_metrics_collector import AggregatedMetricsCollector + +# Collect individual request metrics +request_collector = RequestMetricsCollector() +request_collector.calculate_metrics(user_response) + +# Aggregate metrics across multiple requests +aggregated_collector = AggregatedMetricsCollector() + +# Add individual request metrics +for request_metrics in request_metrics_list: + aggregated_collector.add_single_request_metrics(request_metrics) + +# Perform final aggregation +aggregated_collector.aggregate_metrics_data( + start_time=start_time, + end_time=end_time, + dataset_character_to_token_ratio=4.0, + warmup_ratio=0.1, + cooldown_ratio=0.1 +) +``` + +#### Time Unit Conversion +```python +from genai_bench.time_units import TimeUnitConverter + +# Convert latency from seconds to milliseconds +latency_ms = TimeUnitConverter.convert_time_unit( + latency_s, "s", "ms" +) + +# Convert throughput from tokens/s to tokens/ms +throughput_ms = TimeUnitConverter.convert_throughput_unit( + throughput_s, "tokens/s", "tokens/ms" +) +``` + +#### Live Metrics Monitoring +```python +# Get live metrics data +live_metrics = aggregated_collector.get_live_metrics_data() + +# Update dashboard with live data +dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") +dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") +``` + +#### Statistical Analysis +```python +# Access statistical data +stats = aggregated_metrics.stats + +# Get specific metric statistics +ttft_stats = stats.ttft +print(f"TTFT - Mean: {ttft_stats.mean}, P95: {ttft_stats.p95}") + +# Get error analysis +print(f"Error Rate: {aggregated_metrics.error_rate}") +print(f"Error Codes: {aggregated_metrics.error_codes_frequency}") +``` + +## Advanced Data Loading System + +### Dataset Configuration + +#### `DatasetConfig` +Complete dataset configuration with flexible source support. + +```python +class DatasetConfig(BaseModel): + source: DatasetSourceConfig + prompt_column: Optional[str] = None + image_column: Optional[str] = None + prompt_lambda: Optional[str] = None + unsafe_allow_large_images: bool = False + + @classmethod + def from_file(cls, config_path: str) -> "DatasetConfig": ... + + @classmethod + def from_cli_args( + cls, + dataset_path: Optional[str] = None, + prompt_column: Optional[str] = None, + image_column: Optional[str] = None, + **kwargs, + ) -> "DatasetConfig": ... +``` + +#### `DatasetSourceConfig` +Configuration for different dataset sources. + +```python +class DatasetSourceConfig(BaseModel): + type: str = Field(..., description="Dataset source type: 'file', 'huggingface', or 'custom'") + path: Optional[str] = Field(None, description="Path to dataset (file path or HuggingFace ID)") + file_format: Optional[str] = Field(None, description="File format: 'csv', 'txt', 'json'") + huggingface_kwargs: Optional[Dict[str, Any]] = Field( + None, description="Keyword arguments passed directly to HuggingFace load_dataset" + ) + loader_class: Optional[str] = Field(None, description="Python import path for custom dataset loader") + loader_kwargs: Optional[Dict[str, Any]] = Field(None, description="Keyword arguments for custom loader") +``` + +### Dataset Sources + +#### `DatasetSource` +Abstract base class for dataset sources. + +```python +class DatasetSource(ABC): + def __init__(self, config: DatasetSourceConfig): ... + + @abstractmethod + def load(self) -> Any: ... +``` + +#### `FileDatasetSource` +Load datasets from local files (txt, csv, json). + +```python +class FileDatasetSource(DatasetSource): + def load(self) -> Union[List[str], List[Tuple[str, Any]]]: ... + + def _load_text_file(self, file_path: Path) -> List[str]: ... + def _load_csv_file(self, file_path: Path) -> Any: ... + def _load_json_file(self, file_path: Path) -> List[Any]: ... +``` + +#### `HuggingFaceDatasetSource` +Load datasets from HuggingFace Hub. + +```python +class HuggingFaceDatasetSource(DatasetSource): + def load(self) -> Any: ... +``` + +### Data Loaders + +#### `DatasetLoader` +Abstract base class for dataset loaders. + +```python +class DatasetLoader(ABC): + supported_formats: Set[DatasetFormat] = set() + media_type: str = "" + + def __init__(self, dataset_config: DatasetConfig): ... + + def load_request(self) -> Union[List[str], List[Tuple[str, Any]]]: ... +``` + +#### `TextDatasetLoader` +For loading text datasets. + +```python +class TextDatasetLoader(DatasetLoader): + supported_formats = {DatasetFormat.TEXT, DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} + media_type = "text" +``` + +#### `ImageDatasetLoader` +For loading image datasets. + +```python +class ImageDatasetLoader(DatasetLoader): + supported_formats = {DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} + media_type = "image" +``` + +### Data Loading Factory + +#### `DataLoaderFactory` +Factory for creating data loaders and loading data. + +```python +class DataLoaderFactory: + @staticmethod + def load_data_for_task( + task: str, dataset_config: DatasetConfig + ) -> Union[List[str], List[Tuple[str, Any]]]: ... + + @staticmethod + def _load_text_data( + dataset_config: DatasetConfig, output_modality: str + ) -> List[str]: ... + + @staticmethod + def _load_image_data( + dataset_config: DatasetConfig, + ) -> List[Tuple[str, Any]]: ... +``` + +### Data Loading Usage Examples + +#### File-based Dataset Loading +```python +from genai_bench.data.config import DatasetConfig, DatasetSourceConfig +from genai_bench.data.loaders.factory import DataLoaderFactory + +# Load from CSV file +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/dataset.csv", + file_format="csv" + ), + prompt_column="text" +) + +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) +``` + +#### HuggingFace Dataset Loading +```python +# Load from HuggingFace Hub +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="huggingface", + path="squad", + huggingface_kwargs={ + "split": "train", + "streaming": True + } + ), + prompt_column="question" +) + +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) +``` + +#### Custom Dataset Loading +```python +# Load with custom loader +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="custom", + loader_class="my_package.CustomLoader", + loader_kwargs={ + "api_key": "your-api-key", + "endpoint": "https://api.example.com/data" + } + ) +) + +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) +``` + +#### Image Dataset Loading +```python +# Load image dataset +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/images.csv", + file_format="csv" + ), + prompt_column="caption", + image_column="image_path" +) + +data = DataLoaderFactory.load_data_for_task("image-text-to-text", dataset_config) +``` + +#### Configuration from File +```python +# Load configuration from JSON file +dataset_config = DatasetConfig.from_file("/path/to/dataset_config.json") + +# Load data +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) +``` + +#### CLI Integration +```python +# Create configuration from CLI arguments +dataset_config = DatasetConfig.from_cli_args( + dataset_path="/path/to/dataset.csv", + prompt_column="text", + image_column="image" +) + +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) +``` + +## Analysis and Reporting Classes + +### `FlexiblePlotGenerator` +Generates plots using flexible configuration. + +```python +class FlexiblePlotGenerator: + def __init__(self, config: PlotConfig): ... + + def generate_plots( + self, + run_data_list: List[Tuple[ExperimentMetadata, ExperimentMetrics]], + group_key: str, + experiment_folder: str, + metrics_time_unit: str = "s" + ) -> None: ... +``` + +### `PlotConfig` +Configuration for plot generation. + +```python +class PlotConfig(BaseModel): + title: str + plots: List[PlotSpec] + figure_size: Tuple[int, int] = (12, 8) + dpi: int = 100 + # ... more configuration options +``` + +### `ExperimentLoader` +Loads experiment data from files. + +```python +def load_multiple_experiments( + folder_name: str, + filter_criteria=None +) -> List[Tuple[ExperimentMetadata, ExperimentMetrics]]: ... + +def load_one_experiment( + folder_name: str, + filter_criteria: Optional[Dict[str, Any]] = None +) -> Tuple[Optional[ExperimentMetadata], ExperimentMetrics]: ... +``` + +## Configuration Classes + +> **βš™οΈ Learn More**: For configuration examples and best practices, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#selecting-datasets). + +### `DatasetConfig` +Configuration for dataset loading. + +```python +class DatasetConfig(BaseModel): + source: DatasetSourceConfig + prompt_column: Optional[str] = None + image_column: Optional[str] = None + unsafe_allow_large_images: bool = False +``` + +### `DatasetSourceConfig` +Configuration for dataset sources. + +```python +class DatasetSourceConfig(BaseModel): + type: Literal["file", "huggingface", "custom"] + path: Optional[str] = None + file_format: Optional[str] = None + huggingface_dataset: Optional[str] = None + huggingface_config: Optional[str] = None + huggingface_split: Optional[str] = None + loader_class: Optional[str] = None + loader_kwargs: Optional[Dict[str, Any]] = None +``` + +## Comprehensive Examples + +> **πŸš€ Learn More**: For step-by-step tutorials and practical examples, see the [User Guide](../user-guide/index.md). + +### Complete Multi-Cloud Benchmarking Setup + +#### End-to-End Benchmarking Pipeline +```python +import os +from genai_bench.auth.unified_factory import UnifiedAuthFactory +from genai_bench.storage.factory import StorageFactory +from genai_bench.distributed.runner import DistributedRunner, DistributedConfig +from genai_bench.ui.dashboard import create_dashboard +from genai_bench.data.config import DatasetConfig, DatasetSourceConfig +from genai_bench.data.loaders.factory import DataLoaderFactory + +# 1. Configure Authentication +model_auth = UnifiedAuthFactory.create_model_auth( + "openai", + api_key=os.getenv("OPENAI_API_KEY") +) + +storage_auth = UnifiedAuthFactory.create_storage_auth( + "aws", + access_key_id=os.getenv("AWS_ACCESS_KEY_ID"), + secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"), + region="us-east-1" +) + +# 2. Create Storage +storage = StorageFactory.create_storage("aws", storage_auth) + +# 3. Configure Dataset +dataset_config = DatasetConfig( + source=DatasetSourceConfig( + type="huggingface", + path="squad", + huggingface_kwargs={"split": "train", "streaming": True} + ), + prompt_column="question" +) + +# 4. Load Data +data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) + +# 5. Configure Distributed Execution +config = DistributedConfig( + num_workers=4, + master_host="127.0.0.1", + master_port=5557 +) + +# 6. Create Dashboard +dashboard = create_dashboard(metrics_time_unit="s") + +# 7. Run Benchmark +runner = DistributedRunner(environment, config, dashboard) +runner.setup() + +# Upload results +storage.upload_folder( + "/path/to/results", + "my-bucket", + prefix="benchmarks/2024/" +) +``` + +#### Multi-Provider Authentication Setup +```python +# OpenAI + AWS S3 +openai_auth = UnifiedAuthFactory.create_model_auth("openai", api_key="sk-...") +aws_storage_auth = UnifiedAuthFactory.create_storage_auth("aws", profile="default") + +# Azure OpenAI + Azure Blob +azure_auth = UnifiedAuthFactory.create_model_auth( + "azure-openai", + endpoint="https://your-resource.openai.azure.com/", + deployment="your-deployment", + api_key="your-api-key" +) +azure_storage_auth = UnifiedAuthFactory.create_storage_auth( + "azure", + account_name="your-storage-account", + account_key="your-account-key" +) + +# GCP Vertex + GCP Storage +gcp_auth = UnifiedAuthFactory.create_model_auth( + "gcp-vertex", + project_id="your-project", + location="us-central1" +) +gcp_storage_auth = UnifiedAuthFactory.create_storage_auth( + "gcp", + project_id="your-project" +) + +# OCI GenAI + OCI Object Storage +oci_auth = UnifiedAuthFactory.create_model_auth( + "oci", + config_path="~/.oci/config", + profile="DEFAULT" +) +oci_storage_auth = UnifiedAuthFactory.create_storage_auth( + "oci", + config_path="~/.oci/config", + profile="DEFAULT" +) +``` + +#### Advanced Distributed Configuration +```python +# High-performance distributed setup +config = DistributedConfig( + num_workers=8, + master_host="0.0.0.0", # Allow external connections + master_port=5557, + wait_time=5, + pin_to_cores=True, + cpu_affinity_map={ + 0: 0, 1: 1, 2: 2, 3: 3, + 4: 4, 5: 5, 6: 6, 7: 7 + } +) + +# Create runner with custom dashboard +dashboard = create_dashboard(metrics_time_unit="ms") +runner = DistributedRunner(environment, config, dashboard) +runner.setup() + +# Dynamic scenario updates +scenarios = ["N(100,50)", "N(200,100)", "D(150,150)", "U(50,250)"] +for scenario in scenarios: + runner.update_scenario(scenario) + # Run benchmark with this scenario + # ... benchmark execution ... +``` + +#### Custom Dataset Loading Examples +```python +# Text dataset from CSV +text_config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/text_data.csv", + file_format="csv" + ), + prompt_column="text" +) + +# Image dataset from JSON +image_config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/images.json", + file_format="json" + ), + prompt_column="caption", + image_column="image_path" +) + +# HuggingFace dataset with custom parameters +hf_config = DatasetConfig( + source=DatasetSourceConfig( + type="huggingface", + path="squad", + huggingface_kwargs={ + "split": "train", + "streaming": True, + "cache_dir": "/tmp/hf_cache" + } + ), + prompt_column="question" +) + +# Custom dataset loader +custom_config = DatasetConfig( + source=DatasetSourceConfig( + type="custom", + loader_class="my_package.CustomDataLoader", + loader_kwargs={ + "api_endpoint": "https://api.example.com/data", + "api_key": "your-api-key", + "batch_size": 1000 + } + ) +) + +# Load data for different tasks +text_data = DataLoaderFactory.load_data_for_task("text-to-text", text_config) +image_data = DataLoaderFactory.load_data_for_task("image-text-to-text", image_config) +hf_data = DataLoaderFactory.load_data_for_task("text-to-embeddings", hf_config) +custom_data = DataLoaderFactory.load_data_for_task("text-to-text", custom_config) +``` + +#### Advanced Storage Operations +```python +# Multi-cloud backup +providers = ["aws", "azure", "gcp", "oci"] +storages = [] + +for provider in providers: + auth = UnifiedAuthFactory.create_storage_auth(provider, **provider_configs[provider]) + storage = StorageFactory.create_storage(provider, auth) + storages.append(storage) + +# Upload to all providers +for storage in storages: + storage.upload_folder( + local_folder="/path/to/results", + bucket="benchmark-results", + prefix="backup/2024/" + ) + +# Advanced upload with metadata +storage.upload_file( + local_path="/path/to/results.json", + remote_path="benchmarks/2024/results.json", + bucket="my-bucket", + metadata={ + "experiment": "llm-benchmark", + "model": "gpt-4", + "version": "1.0", + "timestamp": "2024-01-01T00:00:00Z" + }, + encryption="AES256" +) + +# List and filter objects +for obj in storage.list_objects( + bucket="my-bucket", + prefix="benchmarks/2024/", + max_keys=100 +): + if obj.endswith(".json"): + print(f"Found result file: {obj}") +``` + +#### Custom Plot Generation +```python +from genai_bench.analysis.flexible_plot_report import FlexiblePlotGenerator +from genai_bench.analysis.plot_config import PlotConfig, PlotSpec + +# Create comprehensive plot configuration +config = PlotConfig( + title="LLM Performance Analysis", + plots=[ + PlotSpec( + x_field="concurrency", + y_fields=["e2e_latency", "ttft"], + plot_type="line", + title="Latency vs Concurrency", + x_label="Concurrency Level", + y_label="Latency (ms)" + ), + PlotSpec( + x_field="concurrency", + y_fields=["input_throughput", "output_throughput"], + plot_type="bar", + title="Throughput vs Concurrency", + x_label="Concurrency Level", + y_label="Throughput (tokens/s)" + ), + PlotSpec( + x_field="input_throughput", + y_fields=["e2e_latency"], + plot_type="scatter", + title="Latency vs Input Throughput", + x_label="Input Throughput (tokens/s)", + y_label="Latency (ms)" + ) + ], + figure_size=(15, 10), + dpi=300 +) + +# Generate plots +generator = FlexiblePlotGenerator(config) +generator.generate_plots( + run_data_list, + group_key="traffic_scenario", + experiment_folder="/path/to/results", + metrics_time_unit="ms" +) +``` + +#### Metrics Analysis and Monitoring +```python +from genai_bench.metrics.aggregated_metrics_collector import AggregatedMetricsCollector +from genai_bench.analysis.experiment_loader import load_multiple_experiments + +# Load experiment data +experiments = load_multiple_experiments( + folder_name="/path/to/experiments", + filter_criteria={"model": "gpt-4", "task": "text-to-text"} +) + +# Analyze metrics +for metadata, metrics in experiments: + print(f"Experiment: {metadata.experiment_folder_name}") + print(f"Model: {metadata.model}") + print(f"Task: {metadata.task}") + print(f"Concurrency: {metadata.num_concurrency}") + + # Performance metrics + print(f"Mean TTFT: {metrics.stats.ttft.mean:.3f}ms") + print(f"P95 TTFT: {metrics.stats.ttft.p95:.3f}ms") + print(f"Mean Throughput: {metrics.mean_output_throughput_tokens_per_s:.2f} tokens/s") + print(f"Error Rate: {metrics.error_rate:.2%}") + + # Error analysis + if metrics.error_codes_frequency: + print("Error Codes:") + for code, count in metrics.error_codes_frequency.items(): + print(f" {code}: {count} occurrences") +``` + +#### Dashboard Customization +```python +import os + +# Force rich dashboard for development +os.environ["ENABLE_UI"] = "true" +dashboard = create_dashboard(metrics_time_unit="s") + +# Force minimal dashboard for production +os.environ["ENABLE_UI"] = "false" +dashboard = create_dashboard(metrics_time_unit="ms") + +# Custom dashboard usage +with dashboard.live: + # Update with live metrics + live_metrics = { + "ttft": [0.1, 0.2, 0.15, 0.3], + "input_throughput": [100, 120, 110, 90], + "output_throughput": [50, 60, 55, 45], + "output_latency": [0.5, 0.6, 0.55, 0.7], + "stats": { + "mean_ttft": 0.1875, + "mean_input_throughput": 105.0, + "mean_output_throughput": 52.5, + "mean_output_latency": 0.5875 + } + } + + dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") + dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") + dashboard.update_scatter_plot_panel(live_metrics["ttft"], time_unit="s") +``` + +#### Time Unit Conversion Examples +```python +from genai_bench.time_units import TimeUnitConverter + +# Convert latency metrics +latency_s = 0.5 +latency_ms = TimeUnitConverter.convert_time_unit(latency_s, "s", "ms") +latency_us = TimeUnitConverter.convert_time_unit(latency_s, "s", "ΞΌs") + +print(f"Latency: {latency_s}s = {latency_ms}ms = {latency_us}ΞΌs") + +# Convert throughput metrics +throughput_s = 100.0 # tokens/s +throughput_ms = TimeUnitConverter.convert_throughput_unit(throughput_s, "tokens/s", "tokens/ms") +throughput_us = TimeUnitConverter.convert_throughput_unit(throughput_s, "tokens/s", "tokens/ΞΌs") + +print(f"Throughput: {throughput_s} tokens/s = {throughput_ms} tokens/ms = {throughput_us} tokens/ΞΌs") +``` + +#### Error Handling and Recovery +```python +import logging +from genai_bench.logging import init_logger + +logger = init_logger(__name__) + +try: + # Create authentication + auth = UnifiedAuthFactory.create_model_auth("openai", api_key="invalid-key") +except ValueError as e: + logger.error(f"Authentication failed: {e}") + # Fallback to different provider + auth = UnifiedAuthFactory.create_model_auth("azure-openai", **azure_config) + +try: + # Create storage + storage = StorageFactory.create_storage("aws", storage_auth) +except Exception as e: + logger.error(f"Storage creation failed: {e}") + # Fallback to local storage + storage = None + +# Handle distributed runner errors +try: + runner = DistributedRunner(environment, config, dashboard) + runner.setup() +except Exception as e: + logger.error(f"Distributed setup failed: {e}") + # Fallback to local mode + config.num_workers = 0 + runner = DistributedRunner(environment, config, dashboard) + runner.setup() +``` + +### Production Deployment Examples + +#### Docker-based Deployment +```dockerfile +FROM python:3.9-slim + +# Install dependencies +RUN pip install genai-bench[all] + +# Set environment variables +ENV ENABLE_UI=false +ENV TOKENIZERS_PARALLELISM=false + +# Copy configuration +COPY config/ /app/config/ +COPY data/ /app/data/ + +# Run benchmark +CMD ["genai-bench", "benchmark", "--config", "/app/config/benchmark.yaml"] +``` + +#### Kubernetes Deployment +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: genai-bench +spec: + replicas: 1 + selector: + matchLabels: + app: genai-bench + template: + metadata: + labels: + app: genai-bench + spec: + containers: + - name: genai-bench + image: genai-bench:latest + env: + - name: ENABLE_UI + value: "false" + - name: OPENAI_API_KEY + valueFrom: + secretKeyRef: + name: api-keys + key: openai-key + resources: + requests: + memory: "2Gi" + cpu: "1000m" + limits: + memory: "4Gi" + cpu: "2000m" +``` + +#### CI/CD Integration +```yaml +name: LLM Benchmarking +on: + schedule: + - cron: '0 2 * * *' # Daily at 2 AM + +jobs: + benchmark: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.9' + + - name: Install dependencies + run: | + pip install genai-bench[all] + + - name: Run benchmark + env: + OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + run: | + genai-bench benchmark \ + --api-backend openai \ + --model gpt-4 \ + --task text-to-text \ + --traffic-scenario "N(100,50)" \ + --num-concurrency 1,2,4,8 \ + --max-time-per-run 300 \ + --upload-results \ + --storage-provider aws \ + --storage-bucket benchmark-results +``` + +## Logging and Utilities + +### Logging System + +#### `LoggingManager` +Centralized logging management for the application. + +```python +class LoggingManager: + def __init__(self): ... + + def setup_logging(self, level: str = "INFO"): ... + + def get_logger(self, name: str) -> logging.Logger: ... +``` + +#### `WorkerLoggingManager` +Specialized logging for distributed worker processes. + +```python +class WorkerLoggingManager: + def __init__(self): ... + + def setup_worker_logging(self, worker_id: int): ... + + def send_log_to_master(self, message: str, level: str): ... +``` + +#### `init_logger` +Initialize logger for a specific module. + +```python +def init_logger(name: str) -> logging.Logger: ... +``` + +### Utility Functions + +#### `calculate_sonnet_char_token_ratio` +Calculate character-to-token ratio for Sonnet model. + +```python +def calculate_sonnet_char_token_ratio() -> float: ... +``` + +#### `sanitize_string` +Sanitize string for safe usage in file paths and identifiers. + +```python +def sanitize_string(text: str) -> str: ... +``` + +### Logging Usage Examples + +#### Basic Logging Setup +```python +from genai_bench.logging import init_logger + +# Initialize logger for your module +logger = init_logger(__name__) + +# Use logger +logger.info("Starting benchmark") +logger.warning("High memory usage detected") +logger.error("Authentication failed") +``` + +#### Distributed Logging +```python +from genai_bench.logging import WorkerLoggingManager + +# Setup worker logging +worker_logger = WorkerLoggingManager() +worker_logger.setup_worker_logging(worker_id=0) + +# Send logs to master +worker_logger.send_log_to_master("Worker started", "INFO") +worker_logger.send_log_to_master("Processing request", "DEBUG") +``` + +#### Utility Function Usage +```python +from genai_bench.utils import calculate_sonnet_char_token_ratio, sanitize_string + +# Calculate token ratio +ratio = calculate_sonnet_char_token_ratio() +print(f"Sonnet char/token ratio: {ratio}") + +# Sanitize strings +safe_name = sanitize_string("My Experiment (v1.0)") +print(f"Sanitized: {safe_name}") +``` + +## CLI System Enhancements + +### Option Groups + +#### API Options +```python +api_options = [ + click.option("--api-backend", required=True, help="API backend"), + click.option("--api-base", help="API base URL"), + click.option("--api-key", help="API key"), + click.option("--model", required=True, help="Model name"), + click.option("--task", required=True, help="Task type") +] +``` + +#### Authentication Options +```python +model_auth_options = [ + click.option("--model-auth-type", help="Model authentication type"), + click.option("--aws-access-key-id", help="AWS access key"), + click.option("--aws-secret-access-key", help="AWS secret key"), + click.option("--azure-endpoint", help="Azure endpoint"), + click.option("--gcp-project-id", help="GCP project ID") +] + +storage_auth_options = [ + click.option("--storage-provider", help="Storage provider"), + click.option("--storage-bucket", help="Storage bucket"), + click.option("--storage-prefix", help="Storage prefix") +] +``` + +#### Distributed Options +```python +distributed_locust_options = [ + click.option("--num-workers", default=0, help="Number of worker processes"), + click.option("--master-port", default=5557, help="Master port"), + click.option("--spawn-rate", default=1, help="Spawn rate") +] +``` + +### Validation Functions + +#### `validate_tokenizer` +Validate tokenizer configuration. + +```python +def validate_tokenizer(tokenizer_name: str, model: str) -> bool: ... +``` + +### CLI Usage Examples + +#### Basic CLI Usage +```bash +# Run benchmark with OpenAI +genai-bench benchmark \ + --api-backend openai \ + --api-key $OPENAI_KEY \ + --model gpt-4 \ + --task text-to-text \ + --traffic-scenario "N(100,50)" \ + --num-concurrency 1,2,4,8 + +# Run with Azure OpenAI +genai-bench benchmark \ + --api-backend azure-openai \ + --azure-endpoint https://your-resource.openai.azure.com/ \ + --azure-deployment your-deployment \ + --model gpt-4 \ + --task text-to-text + +# Run with distributed workers +genai-bench benchmark \ + --api-backend openai \ + --model gpt-4 \ + --task text-to-text \ + --num-workers 4 \ + --master-port 5557 +``` + +#### Advanced CLI Usage +```bash +# Multi-cloud setup +genai-bench benchmark \ + --api-backend openai \ + --model gpt-4 \ + --task text-to-text \ + --upload-results \ + --storage-provider aws \ + --storage-bucket my-bucket \ + --storage-prefix benchmarks/2024 + +# Custom dataset +genai-bench benchmark \ + --api-backend openai \ + --model gpt-4 \ + --task text-to-text \ + --dataset-path /path/to/dataset.csv \ + --dataset-prompt-column text + +# HuggingFace dataset +genai-bench benchmark \ + --api-backend openai \ + --model gpt-4 \ + --task text-to-text \ + --dataset-config /path/to/dataset_config.json +``` + +## Contributing to API Documentation + +We welcome contributions to improve our API documentation! If you'd like to help: + +1. **Add docstrings** to undocumented functions and classes +2. **Provide usage examples** for complex components +3. **Document edge cases** and common gotchas +4. **Update examples** with new features and best practices +5. **Add troubleshooting sections** for common issues +6. **Submit a pull request** with your improvements + +### Documentation Guidelines + +- **Code Examples**: Include complete, runnable examples +- **Error Handling**: Show how to handle common errors +- **Best Practices**: Highlight recommended usage patterns +- **Cross-References**: Link related components and concepts +- **Version Compatibility**: Note any version-specific features + +### Areas Needing Documentation + +- **Custom Authentication Providers**: How to implement custom auth +- **Custom Storage Providers**: How to add new storage backends +- **Custom Dataset Loaders**: How to create custom data sources +- **Performance Tuning**: Optimization strategies and tips +- **Troubleshooting**: Common issues and solutions + +See our [Contributing Guide](../development/contributing.md) for more details on how to contribute to the project. + +## Troubleshooting and Support + +### Common Issues + +- **Authentication Problems**: See the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md) for detailed setup instructions +- **Performance Issues**: Check the [Distributed Benchmarking Guide](../user-guide/run-benchmark.md#distributed-benchmark) for optimization tips +- **Dataset Loading**: Refer to the [Dataset Configuration Examples](../user-guide/run-benchmark.md#selecting-datasets) for proper setup +- **Storage Upload**: See the [Upload Results Guide](../user-guide/upload-benchmark-result.md) for troubleshooting storage issues + +### Additional Resources + +- **[Development Guide](../development/index.md)** - Contributing and development setup +- **[Multi-Cloud Quick Reference](../user-guide/multi-cloud-quick-reference.md)** - Quick setup reference for all providers + +### Getting Help -1. Add docstrings to undocumented functions -2. Provide usage examples -3. Document edge cases and gotchas -4. Submit a pull request +If you encounter issues not covered in the documentation: -See our [Contributing Guide](../development/contributing.md) for more details. \ No newline at end of file +1. Check the [GitHub Issues](https://github.com/sgl-project/genai-bench/issues) for known problems +2. Review the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md) for provider-specific issues +3. Consult the [Run Benchmark Guide](../user-guide/run-benchmark.md) for usage examples +4. Open a new issue with detailed error information and configuration \ No newline at end of file diff --git a/docs/development/index.md b/docs/development/index.md index 0c710bd2..d7d0e920 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -73,8 +73,7 @@ genai-bench/ β”‚ β”œβ”€β”€ storage/ # Storage providers β”‚ └── user/ # User implementations β”œβ”€β”€ tests/ # Test suite -β”œβ”€β”€ docs/ # Documentation -└── examples/ # Example configurations +└── docs/ # Documentation ``` ## Key Components diff --git a/docs/examples/index.md b/docs/examples/index.md deleted file mode 100644 index 2036a353..00000000 --- a/docs/examples/index.md +++ /dev/null @@ -1,92 +0,0 @@ -# Examples - -This section provides practical examples and configurations for GenAI Bench. - -## Quick Examples - -### OpenAI GPT-4 Benchmark - -```bash -genai-bench benchmark \ - --api-backend openai \ - --api-base https://api.openai.com/v1 \ - --api-key $OPENAI_API_KEY \ - --api-model-name gpt-4 \ - --model-tokenizer gpt2 \ - --task text-to-text \ - --max-requests-per-run 1000 \ - --max-time-per-run 10 -``` - -### AWS Bedrock Claude Benchmark - -```bash -genai-bench benchmark \ - --api-backend aws-bedrock \ - --api-base https://bedrock-runtime.us-east-1.amazonaws.com \ - --aws-profile default \ - --aws-region us-east-1 \ - --api-model-name anthropic.claude-3-sonnet-20240229-v1:0 \ - --model-tokenizer Anthropic/claude-3-sonnet \ - --task text-to-text \ - --max-requests-per-run 500 \ - --max-time-per-run 10 -``` - -### Multi-Modal Benchmark - -```bash -genai-bench benchmark \ - --api-backend gcp-vertex \ - --api-base https://us-central1-aiplatform.googleapis.com \ - --gcp-project-id my-project \ - --gcp-location us-central1 \ - --gcp-credentials-path /path/to/service-account.json \ - --api-model-name gemini-1.5-pro-vision \ - --model-tokenizer google/gemini \ - --task image-text-to-text \ - --dataset-path /path/to/images \ - --max-requests-per-run 100 \ - --max-time-per-run 10 -``` - -### Embedding Benchmark with Batch Sizes - -```bash -genai-bench benchmark \ - --api-backend openai \ - --api-base https://api.openai.com/v1 \ - --api-key $OPENAI_API_KEY \ - --api-model-name text-embedding-3-large \ - --model-tokenizer cl100k_base \ - --task text-to-embeddings \ - --batch-size 1 --batch-size 8 --batch-size 32 --batch-size 64 \ - --max-requests-per-run 2000 \ - --max-time-per-run 10 -``` - -## Traffic Scenarios - -GenAI Bench supports various traffic patterns: - -### Text Generation Scenarios - -- `D(100,100)` - Deterministic: 100 input tokens, 100 output tokens -- `N(480,240)/(300,150)` - Normal distribution -- `U(50,100)/(200,250)` - Uniform distribution - -### Embedding Scenarios - -- `E(64)` - 64 tokens per document -- `E(512)` - 512 tokens per document -- `E(1024)` - 1024 tokens per document - -### Vision Scenarios - -- `I(512,512)` - 512x512 pixel images -- `I(1024,512)` - 1024x512 pixel images -- `I(2048,2048)` - 2048x2048 pixel images - -## Contributing Examples - -Have a useful configuration or example? We welcome contributions! Please submit a pull request with your example following our [contribution guidelines](../development/contributing.md). \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index c0d6dfb9..6e66b6fb 100644 --- a/docs/index.md +++ b/docs/index.md @@ -68,17 +68,22 @@ GenAI Bench supports multiple benchmark types: ### πŸ“– User Guide - [Run Benchmark](user-guide/run-benchmark.md) - How to run benchmarks +- [Traffic Scenarios](user-guide/scenario-definition.md) - Understanding traffic scenario syntax - [Multi-Cloud Authentication & Storage](user-guide/multi-cloud-auth-storage.md) - Comprehensive guide for cloud provider authentication - [Multi-Cloud Quick Reference](user-guide/multi-cloud-quick-reference.md) - Quick examples for common scenarios - [Docker Deployment](user-guide/run-benchmark-using-docker.md) - Docker-based benchmarking -- [Generate Excel Sheet](user-guide/generate-excel-sheet.md) - Creating Excel reports -- [Generate Plot](user-guide/generate-plot.md) - Creating visualizations -- [Upload Benchmark Results](user-guide/upload-benchmark-result.md) - Uploading results +- [Excel Reports](user-guide/generate-excel-sheet.md) - Creating Excel reports +- [Visualizations](user-guide/generate-plot.md) - Creating visualizations +- [Upload Results](user-guide/upload-benchmark-result.md) - Uploading results ### πŸ”§ Development - [Contributing](development/contributing.md) - How to contribute to GenAI Bench +### πŸ“š API Reference + +- [API Documentation](api/index.md) - Complete API reference and code examples + ## Support If you encounter any issues or have questions, please: From b350f8fd106aa93a058295258ccc86e147cc1580 Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Thu, 23 Oct 2025 15:46:12 -0700 Subject: [PATCH 06/10] cleanup README --- README.md | 50 +++++++------------------------------------------- 1 file changed, 7 insertions(+), 43 deletions(-) diff --git a/README.md b/README.md index e9442f78..bafd75b1 100644 --- a/README.md +++ b/README.md @@ -40,51 +40,15 @@ It provides detailed insights into model serving performance, offering both a us - πŸ“ **Rich Logs**: Automatically flushed to both terminal and file upon experiment completion. - πŸ“ˆ **Experiment Analyzer**: Generates comprehensive Excel reports with pricing and raw metrics data, plus flexible plot configurations (default 2x4 grid) that visualize key performance metrics including throughput, latency (TTFT, E2E, TPOT), error rates, and RPS across different traffic scenarios and concurrency levels. Supports custom plot layouts and multi-line comparisons. -## How to Start +## Installation -Please check [User Guide](https://docs.sglang.ai/genai-bench/user-guide/) and [CONTRIBUTING.md](https://docs.sglang.ai/genai-bench/development/contributing/) for how to install and use genai-bench. +**Quick Start**: Install with `pip install genai-bench`. +Alternatively, check [Installation Guide](https://docs.sglang.ai/genai-bench/getting-started/installation) for other options. -## Benchmark Metrics Definition +## How to use -This section puts together the standard metrics required for LLM serving performance analysis. We classify metrics to two types: **single-request level metrics**, representing the metrics collected from one request. And **aggregated level metrics**, summarizing the single-request metrics from one run (with specific traffic scenario and num concurrency). +Please check [User Guide](https://docs.sglang.ai/genai-bench/user-guide/) for instructions on using genai-bench. -**NOTE**: +## Development -- Each single-request metric includes standard statistics: **percentile**, **min**, **max**, **stddev**, and **mean**. -- The following metrics cover **input**, **output**, and **end-to-end (e2e)** stages. For *chat* tasks, all stages are relevant for evaluation. For *embedding* tasks, where there is no output stage, output metrics will be set to 0. For details about output metrics collection, please check out `OUTPUT_METRICS_FIELDS` in [metrics.py](genai_bench/metrics/metrics.py). - -### Single Request Level Metrics - -The following metrics capture token-level performance for a single request, providing insights into server efficiency for each individual request. - -| Glossary | Meaning | Calculation Formula | Units | -|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|---------------| -| TTFT | Time to First Token. Initial response time when the first output token is generated.
This is also known as the latency for the input (input) stage. | `TTFT = time_at_first_token - start_time` | seconds | -| End-to-End Latency | End-to-End latency. This metric indicates how long it takes from submitting a query to receiving the full response, including network latencies. | `e2e_latency = end_time - start_time` | seconds | -| TPOT | Time Per Output Token. The average time between two subsequent generated tokens. | `TPOT = (e2e_latency - TTFT) / (num_output_tokens - 1)` | seconds | -| Output Latency | Output latency. This metric indicates how long it takes to receive the full response after the first token is generated. | `output_latency = e2e_latency - TTFT` | seconds | -| Output Inference Speed | The rate of how many tokens the model can generate per second for a single request. | `inference_speed = 1 / TPOT` | tokens/second | -| Num of Input Tokens | Number of prompt tokens. | `num_input_tokens = tokenizer.encode(prompt)` | tokens | -| Num of Output Tokens | Number of output tokens. | `num_output_tokens = num_completion_tokens` | tokens | -| Num of Request Tokens | Total number of tokens processed in one request. | `num_request_tokens = num_input_tokens + num_output_tokens` | tokens | -| Input Throughput | The overall throughput of input (input process). | `input_throughput = num_input_tokens / TTFT` | tokens/second | -| Output Throughput | The throughput of output (output generation) for a single request. | `output_throughput = (num_output_tokens - 1) / output_latency` | tokens/second | - -### Aggregated Metrics - -This metrics collection summarizes the metrics relevant to a specific traffic load pattern, defined by the traffic scenario and the num of concurrency. It provides insights into server capacity and performance under pressure. - -| Glossary | Meaning | Calculation Formula | Units | -|---------------------------|------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|---------------| -| Mean Input Throughput | The average throughput of how many input tokens can be processed by the model in one run with multiple concurrent requests. | `mean_input_throughput = sum(input_tokens_for_all_requests) / run_duration` | tokens/second | -| Mean Output Throughput | The average throughput of how many output tokens can be processed by the model in one run with multiple concurrent requests. | `mean_output_throughput = sum(output_tokens_for_all_requests) / run_duration` | tokens/second | -| Total Tokens Throughput | The average throughput of how many tokens can be processed by the model, including both input and output tokens. | `mean_total_tokens_throughput = all_requests["total_tokens"]["sum"] / run_duration` | tokens/second | -| Total Chars Per Hour[^1] | The average total characters can be processed by the model per hour. | `total_chars_per_hour = total_tokens_throughput * dataset_chars_to_token_ratio * 3600` | Characters | -| Requests Per Minute | The number of requests processed by the model per minute. | `num_completed_requests_per_min = num_completed_requests / (end_time - start_time) * 60` | Requests | -| Error Codes to Frequency | A map that shows the returned error status code to its frequency. | | | -| Error Rate | The rate of error requests over total requests. | `error_rate = num_error_requests / num_requests` | | -| Num of Error Requests | The number of error requests in one load. |
if requests.status_code != '200': 
num_error_requests += 1
| | -| Num of Completed Requests | The number of completed requests in one load. |
if requests.status_code == '200': 
num_completed_requests += 1
| | -| Num of Requests | The total number of requests processed for one load. | `total_requests = num_completed_requests + num_error_requests` | | - -[^1]: *Total Chars Per Hour* is derived from a character-to-token ratio based on sonnet.txt and the model’s tokenizer. This metric aids in pricing decisions for an LLM serving solution. For tasks with multi-modal inputs, non-text tokens are converted to an equivalent character count using the same character-to-token ratio. +If you are interested in contributing to GenAI-Bench, you can use the [Development Guide](https://docs.sglang.ai/genai-bench/development/). \ No newline at end of file From 61a4159331fe6e848ca285702a44e23aff51354d Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Thu, 23 Oct 2025 17:35:40 -0700 Subject: [PATCH 07/10] update Developmentt --- docs/.config/mkdocs.yml | 4 +- docs/api/index.md | 2545 -------------------- docs/development/adding-new-features.md | 126 + docs/development/api-reference.md | 203 ++ docs/development/contributing.md | 117 - docs/development/index.md | 24 +- docs/getting-started/command-guidelines.md | 2 +- docs/index.md | 2 +- 8 files changed, 341 insertions(+), 2682 deletions(-) delete mode 100644 docs/api/index.md create mode 100644 docs/development/adding-new-features.md create mode 100644 docs/development/api-reference.md diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index 5a42c9b2..46aa6f69 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -131,5 +131,5 @@ nav: - Development: - development/index.md - Contributing: development/contributing.md - - API Reference: - - api/index.md + - Adding New Features: development/adding-new-features.md + - API Reference: development/api-reference.md diff --git a/docs/api/index.md b/docs/api/index.md deleted file mode 100644 index 27e688dd..00000000 --- a/docs/api/index.md +++ /dev/null @@ -1,2545 +0,0 @@ -# API Reference - -This section provides comprehensive API documentation for GenAI Bench components, including CLI commands, core classes, and usage examples. - -> **Quick Start**: New to GenAI Bench? Check out our [Getting Started Guide](../getting-started/index.md) for installation and basic concepts, or jump to the [User Guide](../user-guide/index.md) for practical examples. - -## Getting Started - -Before diving into the API reference, we recommend familiarizing yourself with these foundational concepts: - -- **[Installation Guide](../getting-started/installation.md)** - Set up GenAI Bench in your environment -- **[Task Definition](../getting-started/task-definition.md)** - Understand supported task types and their requirements -- **[Metrics Definition](../getting-started/metrics-definition.md)** - Learn about performance metrics and measurements -- **[Command Guidelines](../getting-started/command-guidelines.md)** - Best practices for CLI usage - -## Table of Contents - -- [CLI Commands](#cli-commands) -- [Core Protocol Classes](#core-protocol-classes) -- [Scenario System](#scenario-system) -- [Data Loading System](#data-loading-system) -- [Authentication System](#authentication-system) -- [Storage System](#storage-system) -- [UI and Dashboard System](#ui-and-dashboard-system) -- [Distributed System](#distributed-system) -- [Metrics and Analysis](#metrics-and-analysis) -- [Configuration Classes](#configuration-classes) -- [Comprehensive Examples](#comprehensive-examples) - -## CLI Commands - -### `genai-bench benchmark` - -The main command for running benchmarks against LLM endpoints. - -```bash -genai-bench benchmark [OPTIONS] -``` - -**Key Options:** - -- `--api-backend` - API backend (openai, aws-bedrock, azure-openai, gcp-vertex, oci-cohere, oci-genai) -- `--api-key` - API key for authentication -- `--model` - Model name to benchmark -- `--task` - Task type (text-to-text, text-to-embeddings, image-text-to-text, etc.) -- `--traffic-scenario` - Traffic scenario specification -- `--num-concurrency` - Number of concurrent requests -- `--max-time-per-run` - Maximum time per run in seconds -- `--upload-results` - Upload results to cloud storage - -> **πŸ“– Learn More**: For detailed usage examples and multi-cloud configurations, see the [Run Benchmark Guide](../user-guide/run-benchmark.md) and [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). - -**Example:** -```bash -genai-bench benchmark \ - --api-backend openai \ - --api-key $OPENAI_KEY \ - --model gpt-4 \ - --task text-to-text \ - --traffic-scenario "N(100,50)" \ - --num-concurrency 1,2,4,8 \ - --max-time-per-run 300 -``` - -> **πŸ’‘ Tip**: For traffic scenario syntax and examples, see the [Traffic Scenarios Guide](../user-guide/scenario-definition.md). - -### `genai-bench excel` - -Generate Excel reports from benchmark results. - -```bash -genai-bench excel [OPTIONS] -``` - -**Options:** - -- `--experiment-folder` - Path to experiment folder -- `--excel-name` - Name of the Excel file -- `--metric-percentile` - Percentile for metrics (mean, p25, p50, p75, p90, p95, p99) -- `--metrics-time-unit` - Time unit (s, ms) - -> **πŸ“Š Learn More**: For detailed Excel report generation examples, see the [Excel Reports Guide](../user-guide/generate-excel-sheet.md). - -### `genai-bench plot` - -Generate plots from benchmark results with flexible configuration. - -```bash -genai-bench plot [OPTIONS] -``` - -**Options:** - -- `--experiments-folder` - Path to experiments folder -- `--group-key` - Key to group data by -- `--plot-config` - Path to JSON plot configuration -- `--preset` - Built-in plot preset -- `--filter-criteria` - Filter criteria for data - -> **πŸ“ˆ Learn More**: For comprehensive plotting examples and configuration options, see the [Visualizations Guide](../user-guide/generate-plot.md). - -## Core Protocol Classes - -### Request Classes - -#### `UserRequest` -Base class for all user requests. - -```python -class UserRequest(BaseModel): - model: str - additional_request_params: Dict[str, Any] = Field(default_factory=dict) -``` - -#### `UserChatRequest` -For text-to-text tasks. - -```python -class UserChatRequest(UserRequest): - prompt: str - num_prefill_tokens: int | None - max_tokens: int | None -``` - -#### `UserEmbeddingRequest` -For text-to-embeddings tasks. - -```python -class UserEmbeddingRequest(UserRequest): - documents: List[str] - num_prefill_tokens: Optional[int] -``` - -#### `UserImageChatRequest` -For image-text-to-text tasks. - -```python -class UserImageChatRequest(UserChatRequest): - image_content: List[str] - num_images: int -``` - -### Response Classes - -#### `UserResponse` -Base class for all user responses. - -```python -class UserResponse(BaseModel): - status_code: int - time_at_first_token: Optional[float] - start_time: Optional[float] - end_time: Optional[float] - error_message: Optional[str] - num_prefill_tokens: Optional[int] -``` - -#### `UserChatResponse` -For chat task responses. - -```python -class UserChatResponse(UserResponse): - generated_text: Optional[str] - tokens_received: Optional[int] -``` - -### Experiment Metadata - -#### `ExperimentMetadata` -Contains all metadata for an experiment. - -```python -class ExperimentMetadata(BaseModel): - cmd: str - benchmark_version: str - api_backend: str - model: str - task: str - num_concurrency: List[int] - traffic_scenario: List[str] - max_time_per_run_s: int - max_requests_per_run: int - # ... and more fields -``` - -## Scenario System - -> **πŸ“– Learn More**: For comprehensive scenario syntax and usage examples, see the [Traffic Scenarios Guide](../user-guide/scenario-definition.md). - -### `Scenario` -Abstract base class for traffic scenarios. - -```python -class Scenario(ABC): - scenario_type: TextDistribution | MultiModality | EmbeddingDistribution | ReRankDistribution | SpecialScenario - validation_pattern: str - - @abstractmethod - def sample(self) -> Any: ... - - @abstractmethod - def to_string(self) -> str: ... - - @classmethod - @abstractmethod - def parse(cls, params_str: str) -> "Scenario": ... -``` - -### Distribution Types - -#### `TextDistribution` -```python -class TextDistribution(Enum): - NORMAL = "N" - DETERMINISTIC = "D" - UNIFORM = "U" -``` - -#### `NormalDistribution` -Normal distribution scenario for text tasks. - -```python -class NormalDistribution(Scenario): - scenario_type = TextDistribution.NORMAL - validation_pattern = r"^N\(\d+,\d+\)$" - - def __init__(self, mean: int, std: int): ... -``` - -#### `EmbeddingScenario` -Scenario for embedding tasks. - -```python -class EmbeddingScenario(Scenario): - scenario_type = EmbeddingDistribution.EMBEDDING - validation_pattern = r"^E\(\d+\)$" - - def __init__(self, tokens_per_document: int): ... -``` - -## Sampler Classes - -### `Sampler` -Abstract base class for data samplers. - -```python -class Sampler(ABC): - modality_registry: Dict[str, Type["Sampler"]] = {} - input_modality: str - supported_tasks: Set[str] - - def __init__(self, tokenizer, model: str, output_modality: str, ...): ... - - @abstractmethod - def sample(self, scenario: Scenario) -> UserRequest: ... - - @classmethod - def create(cls, task: str, *args, **kwargs) -> "Sampler": ... -``` - -### `TextSampler` -For text-based tasks. - -```python -class TextSampler(Sampler): - input_modality = "text" - supported_tasks = {"text-to-text", "text-to-embeddings", "text-to-rerank"} - - def __init__(self, tokenizer, model: str, output_modality: str, data: List[str], ...): ... -``` - -### `ImageSampler` -For image-based tasks. - -```python -class ImageSampler(Sampler): - input_modality = "image" - supported_tasks = {"image-text-to-text", "image-to-embeddings"} - - def __init__(self, tokenizer, model: str, output_modality: str, data: Any, ...): ... -``` - -## Data Loading System - -> **πŸ“– Learn More**: For dataset configuration examples and advanced usage, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#selecting-datasets). - -### `DatasetLoader` -Abstract base class for dataset loaders. - -```python -class DatasetLoader(ABC): - supported_formats: Set[DatasetFormat] = set() - media_type: str = "" - - def __init__(self, dataset_config: DatasetConfig): ... - - def load_request(self) -> Union[List[str], List[Tuple[str, Any]]]: ... -``` - -### `TextDatasetLoader` -For loading text datasets. - -```python -class TextDatasetLoader(DatasetLoader): - supported_formats = {DatasetFormat.TEXT, DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} - media_type = "text" -``` - -### `ImageDatasetLoader` -For loading image datasets. - -```python -class ImageDatasetLoader(DatasetLoader): - supported_formats = {DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} - media_type = "image" -``` - -## Authentication System - -> **πŸ” Learn More**: For comprehensive authentication setup and multi-cloud configurations, see the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). - -### Base Authentication Interfaces - -#### `AuthProvider` -Base class for all authentication providers. - -```python -class AuthProvider(ABC): - @abstractmethod - def get_config(self) -> Dict[str, Any]: ... - - @abstractmethod - def get_credentials(self) -> Any: ... -``` - -#### `ModelAuthProvider` -Base class for model endpoint authentication. - -```python -class ModelAuthProvider(ABC): - @abstractmethod - def get_headers(self) -> Dict[str, str]: ... - - @abstractmethod - def get_config(self) -> Dict[str, Any]: ... - - @abstractmethod - def get_auth_type(self) -> str: ... - - def get_credentials(self) -> Optional[Any]: ... -``` - -#### `StorageAuthProvider` -Base class for storage authentication. - -```python -class StorageAuthProvider(ABC): - @abstractmethod - def get_client_config(self) -> Dict[str, Any]: ... - - @abstractmethod - def get_credentials(self) -> Any: ... - - @abstractmethod - def get_storage_type(self) -> str: ... - - def get_region(self) -> Optional[str]: ... -``` - -### Authentication Factory - -#### `UnifiedAuthFactory` -Unified factory for creating model and storage authentication providers. - -```python -class UnifiedAuthFactory: - @staticmethod - def create_model_auth(provider: str, **kwargs) -> ModelAuthProvider: ... - - @staticmethod - def create_storage_auth(provider: str, **kwargs) -> StorageAuthProvider: ... -``` - -**Supported Model Providers:** -- `openai` - OpenAI API authentication -- `oci` - Oracle Cloud Infrastructure authentication -- `aws-bedrock` - AWS Bedrock authentication -- `azure-openai` - Azure OpenAI authentication -- `gcp-vertex` - Google Cloud Vertex AI authentication - -**Supported Storage Providers:** -- `aws` - AWS S3 authentication -- `azure` - Azure Blob Storage authentication -- `gcp` - Google Cloud Storage authentication -- `oci` - Oracle Cloud Infrastructure Object Storage authentication -- `github` - GitHub repository authentication - -### Provider-Specific Authentication - -#### OpenAI Authentication -```python -# OpenAI API key authentication -auth = UnifiedAuthFactory.create_model_auth( - "openai", - api_key="sk-..." -) -``` - -#### AWS Bedrock Authentication -```python -# AWS Bedrock authentication with multiple options -auth = UnifiedAuthFactory.create_model_auth( - "aws-bedrock", - access_key_id="AKIA...", - secret_access_key="...", - region="us-east-1" -) - -# Or using AWS profile -auth = UnifiedAuthFactory.create_model_auth( - "aws-bedrock", - profile="default", - region="us-west-2" -) -``` - -#### Azure OpenAI Authentication -```python -# Azure OpenAI authentication -auth = UnifiedAuthFactory.create_model_auth( - "azure-openai", - endpoint="https://your-resource.openai.azure.com/", - deployment="your-deployment", - api_version="2024-02-15-preview", - api_key="your-api-key" -) -``` - -#### GCP Vertex AI Authentication -```python -# GCP Vertex AI authentication -auth = UnifiedAuthFactory.create_model_auth( - "gcp-vertex", - project_id="your-project", - location="us-central1", - credentials_path="/path/to/credentials.json" -) -``` - -#### OCI Authentication -```python -# OCI authentication with multiple methods -# User Principal (default) -auth = UnifiedAuthFactory.create_model_auth( - "oci", - config_path="~/.oci/config", - profile="DEFAULT" -) - -# Instance Principal -auth = UnifiedAuthFactory.create_model_auth( - "oci", - auth_type="instance_principal" -) - -# OBO Token -auth = UnifiedAuthFactory.create_model_auth( - "oci", - auth_type="obo_token", - token="your-obo-token" -) -``` - -### Storage Authentication Examples - -#### AWS S3 Storage -```python -# AWS S3 authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "aws", - access_key_id="AKIA...", - secret_access_key="...", - region="us-east-1" -) -``` - -#### Azure Blob Storage -```python -# Azure Blob Storage authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "azure", - account_name="your-storage-account", - account_key="your-account-key" -) - -# Or using connection string -storage_auth = UnifiedAuthFactory.create_storage_auth( - "azure", - connection_string="DefaultEndpointsProtocol=https;AccountName=..." -) -``` - -#### Google Cloud Storage -```python -# GCP Cloud Storage authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "gcp", - project_id="your-project", - credentials_path="/path/to/credentials.json" -) -``` - -#### OCI Object Storage -```python -# OCI Object Storage authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "oci", - config_path="~/.oci/config", - profile="DEFAULT" -) -``` - -#### GitHub Storage -```python -# GitHub repository authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "github", - token="ghp_...", - owner="your-username", - repo="your-repo" -) -``` - -## Storage System - -> **πŸ’Ύ Learn More**: For storage configuration and upload examples, see the [Upload Results Guide](../user-guide/upload-benchmark-result.md) and [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md). - -### Base Storage Interface - -#### `BaseStorage` -Abstract base class for all storage implementations. - -```python -class BaseStorage(ABC): - @abstractmethod - def upload_file( - self, local_path: Union[str, Path], remote_path: str, bucket: str, **kwargs - ) -> None: ... - - @abstractmethod - def upload_folder( - self, local_folder: Union[str, Path], bucket: str, prefix: str = "", **kwargs - ) -> None: ... - - @abstractmethod - def download_file( - self, remote_path: str, local_path: Union[str, Path], bucket: str, **kwargs - ) -> None: ... - - @abstractmethod - def list_objects( - self, bucket: str, prefix: Optional[str] = None, **kwargs - ) -> Generator[str, None, None]: ... - - @abstractmethod - def delete_object(self, remote_path: str, bucket: str, **kwargs) -> None: ... - - @abstractmethod - def get_storage_type(self) -> str: ... -``` - -### Storage Factory - -#### `StorageFactory` -Factory for creating storage provider instances. - -```python -class StorageFactory: - @staticmethod - def create_storage( - provider: str, auth: StorageAuthProvider, **kwargs - ) -> BaseStorage: ... -``` - -**Supported Storage Providers:** -- `aws` - AWS S3 storage -- `azure` - Azure Blob Storage -- `gcp` - Google Cloud Storage -- `oci` - Oracle Cloud Infrastructure Object Storage -- `github` - GitHub repository storage - -### Storage Provider Implementations - -#### AWS S3 Storage -```python -class AWSS3Storage(BaseStorage): - """AWS S3 storage implementation.""" - - def __init__(self, auth: StorageAuthProvider, **kwargs): ... - - def upload_file(self, local_path, remote_path, bucket, **kwargs): ... - def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... - def download_file(self, remote_path, local_path, bucket, **kwargs): ... - def list_objects(self, bucket, prefix=None, **kwargs): ... - def delete_object(self, remote_path, bucket, **kwargs): ... - def get_storage_type(self) -> str: ... -``` - -**Features:** -- Full S3 API support -- Automatic multipart uploads for large files -- Server-side encryption support -- Lifecycle policy management -- Cross-region replication support - -#### Azure Blob Storage -```python -class AzureBlobStorage(BaseStorage): - """Azure Blob Storage implementation.""" - - def __init__(self, auth: StorageAuthProvider, **kwargs): ... - - def upload_file(self, local_path, remote_path, bucket, **kwargs): ... - def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... - def download_file(self, remote_path, local_path, bucket, **kwargs): ... - def list_objects(self, bucket, prefix=None, **kwargs): ... - def delete_object(self, remote_path, bucket, **kwargs): ... - def get_storage_type(self) -> str: ... -``` - -**Features:** -- Blob storage with tier management -- Access control and SAS tokens -- Blob versioning support -- Soft delete capabilities -- Change feed support - -#### Google Cloud Storage -```python -class GCPCloudStorage(BaseStorage): - """Google Cloud Storage implementation.""" - - def __init__(self, auth: StorageAuthProvider, **kwargs): ... - - def upload_file(self, local_path, remote_path, bucket, **kwargs): ... - def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... - def download_file(self, remote_path, local_path, bucket, **kwargs): ... - def list_objects(self, bucket, prefix=None, **kwargs): ... - def delete_object(self, remote_path, bucket, **kwargs): ... - def get_storage_type(self) -> str: ... -``` - -**Features:** -- Multi-regional and regional storage classes -- Object lifecycle management -- Fine-grained access control -- Data encryption at rest and in transit -- Cloud CDN integration - -#### OCI Object Storage -```python -class OCIObjectStorage(BaseStorage): - """Oracle Cloud Infrastructure Object Storage implementation.""" - - def __init__(self, auth: StorageAuthProvider, **kwargs): ... - - def upload_file(self, local_path, remote_path, bucket, **kwargs): ... - def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... - def download_file(self, remote_path, local_path, bucket, **kwargs): ... - def list_objects(self, bucket, prefix=None, **kwargs): ... - def delete_object(self, remote_path, bucket, **kwargs): ... - def get_storage_type(self) -> str: ... -``` - -**Features:** -- High-performance object storage -- Automatic data replication -- Object versioning -- Cross-region backup -- Integration with OCI services - -#### GitHub Storage -```python -class GitHubStorage(BaseStorage): - """GitHub repository storage implementation.""" - - def __init__(self, auth: StorageAuthProvider, **kwargs): ... - - def upload_file(self, local_path, remote_path, bucket, **kwargs): ... - def upload_folder(self, local_folder, bucket, prefix="", **kwargs): ... - def download_file(self, remote_path, local_path, bucket, **kwargs): ... - def list_objects(self, bucket, prefix=None, **kwargs): ... - def delete_object(self, remote_path, bucket, **kwargs): ... - def get_storage_type(self) -> str: ... -``` - -**Features:** -- Git-based versioning -- Pull request integration -- Branch-based organization -- GitHub Actions integration -- Collaborative workflows - -### Storage Usage Examples - -#### Basic File Operations -```python -from genai_bench.storage.factory import StorageFactory -from genai_bench.auth.unified_factory import UnifiedAuthFactory - -# Create storage authentication -storage_auth = UnifiedAuthFactory.create_storage_auth( - "aws", - access_key_id="AKIA...", - secret_access_key="...", - region="us-east-1" -) - -# Create storage instance -storage = StorageFactory.create_storage("aws", storage_auth) - -# Upload a single file -storage.upload_file( - local_path="/path/to/file.txt", - remote_path="benchmarks/2024/file.txt", - bucket="my-bucket" -) - -# Upload entire folder -storage.upload_folder( - local_folder="/path/to/results", - bucket="my-bucket", - prefix="benchmarks/2024/" -) - -# Download file -storage.download_file( - remote_path="benchmarks/2024/file.txt", - local_path="/path/to/downloaded.txt", - bucket="my-bucket" -) - -# List objects -for obj in storage.list_objects("my-bucket", prefix="benchmarks/"): - print(f"Object: {obj}") - -# Delete object -storage.delete_object( - remote_path="benchmarks/2024/file.txt", - bucket="my-bucket" -) -``` - -#### Multi-Cloud Storage Setup -```python -# AWS S3 -aws_storage = StorageFactory.create_storage( - "aws", - UnifiedAuthFactory.create_storage_auth("aws", profile="default") -) - -# Azure Blob Storage -azure_storage = StorageFactory.create_storage( - "azure", - UnifiedAuthFactory.create_storage_auth("azure", account_name="mystorage") -) - -# Google Cloud Storage -gcp_storage = StorageFactory.create_storage( - "gcp", - UnifiedAuthFactory.create_storage_auth("gcp", project_id="my-project") -) - -# Upload to multiple providers -for storage in [aws_storage, azure_storage, gcp_storage]: - storage.upload_folder( - local_folder="/path/to/results", - bucket="my-bucket", - prefix="backup/" - ) -``` - -#### Advanced Storage Operations -```python -# Upload with metadata -storage.upload_file( - local_path="/path/to/file.txt", - remote_path="benchmarks/2024/file.txt", - bucket="my-bucket", - metadata={"experiment": "llm-benchmark", "version": "1.0"} -) - -# Upload with server-side encryption -storage.upload_file( - local_path="/path/to/file.txt", - remote_path="benchmarks/2024/file.txt", - bucket="my-bucket", - encryption="AES256" -) - -# List with filtering -for obj in storage.list_objects( - bucket="my-bucket", - prefix="benchmarks/2024/", - max_keys=100 -): - print(f"Found: {obj}") -``` - -## UI and Dashboard System - -> **πŸ“Š Learn More**: For dashboard usage and configuration examples, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#distributed-benchmark). - -### Dashboard Components - -#### `Dashboard` -Union type for dashboard implementations. - -```python -Dashboard = Union[RichLiveDashboard, MinimalDashboard] -``` - -#### `RichLiveDashboard` -Real-time dashboard with rich UI components for live metrics visualization. - -```python -class RichLiveDashboard: - def __init__(self, metrics_time_unit: str = "s"): ... - - def update_metrics_panels( - self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s" - ): ... - - def update_histogram_panel( - self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s" - ): ... - - def update_scatter_plot_panel( - self, scatter_plot_metrics: Optional[List[float]], time_unit: str = "s" - ): ... - - def update_benchmark_progress_bars(self, progress_increment: float): ... - - def create_benchmark_progress_task(self, run_name: str): ... - - def update_total_progress_bars(self, total_runs: int): ... - - def start_run(self, run_time: int, start_time: float, max_requests_per_run: int): ... - - def calculate_time_based_progress(self) -> float: ... - - def handle_single_request( - self, live_metrics: LiveMetricsData, total_requests: int, error_code: int | None - ): ... - - def reset_panels(self): ... -``` - -**Features:** -- Real-time metrics visualization -- Interactive progress tracking -- Histogram and scatter plot displays -- Live updates with configurable refresh rates -- Rich console output with colors and formatting - -#### `MinimalDashboard` -Lightweight dashboard for headless or minimal UI scenarios. - -```python -class MinimalDashboard: - def __init__(self, metrics_time_unit: str = "s"): ... - - def update_metrics_panels(self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s"): ... - def update_histogram_panel(self, live_metrics: LiveMetricsData, metrics_time_unit: str = "s"): ... - def update_scatter_plot_panel(self, scatter_plot_metrics: Optional[List[float]], time_unit: str = "s"): ... - def update_benchmark_progress_bars(self, progress_increment: float): ... - def create_benchmark_progress_task(self, run_name: str): ... - def update_total_progress_bars(self, total_runs: int): ... - def start_run(self, run_time: int, start_time: float, max_requests_per_run: int): ... - def calculate_time_based_progress(self) -> float: ... - def handle_single_request(self, live_metrics: LiveMetricsData, total_requests: int, error_code: int | None): ... - def reset_panels(self): ... -``` - -**Features:** -- No-op implementations for all dashboard methods -- Minimal resource usage -- Suitable for automated/CI environments -- Compatible with all dashboard interfaces - -### Dashboard Factory - -#### `create_dashboard` -Factory function for creating appropriate dashboard based on environment. - -```python -def create_dashboard(metrics_time_unit: str = "s") -> Dashboard: - """Factory function that returns either RichLiveDashboard or MinimalDashboard based on ENABLE_UI.""" -``` - -**Environment Variables:** -- `ENABLE_UI=true` - Creates `RichLiveDashboard` -- `ENABLE_UI=false` - Creates `MinimalDashboard` - -### Layout System - -#### `create_layout` -Creates the main dashboard layout structure. - -```python -def create_layout() -> Layout: ... -``` - -**Layout Structure:** -- **Row 1**: Total Progress and Benchmark Progress -- **Row 2**: Input and Output metrics panels -- **Row 3**: Scatter plots (TTFT vs Input Throughput, Output Latency vs Output Throughput) -- **Logs**: Log output display - -#### `create_metric_panel` -Creates individual metric panels with latency and throughput data. - -```python -def create_metric_panel( - title, latency_data, throughput_data, metrics_time_unit: str = "s" -) -> Panel: ... -``` - -#### `create_progress_bars` -Creates progress tracking bars. - -```python -def create_progress_bars() -> Tuple[Progress, Progress, int]: ... -``` - -### Plot Components - -#### `create_horizontal_colored_bar_chart` -Creates horizontal bar charts for histogram visualization. - -```python -def create_horizontal_colored_bar_chart( - data: List[float], - title: str, - max_width: int = 50 -) -> str: ... -``` - -#### `create_scatter_plot` -Creates scatter plot visualizations for correlation analysis. - -```python -def create_scatter_plot( - x_data: List[float], - y_data: List[float], - title: str -) -> str: ... -``` - -### Live Metrics Data - -#### `LiveMetricsData` -Structure for real-time metrics data. - -```python -LiveMetricsData = { - "ttft": List[float], - "input_throughput": List[float], - "output_throughput": List[float], - "output_latency": List[float], - "stats": Dict[str, Any] -} -``` - -### Dashboard Usage Examples - -#### Basic Dashboard Setup -```python -from genai_bench.ui.dashboard import create_dashboard - -# Create dashboard (automatically selects based on ENABLE_UI) -dashboard = create_dashboard(metrics_time_unit="s") - -# Use with context manager for live updates -with dashboard.live: - # Update metrics - dashboard.update_metrics_panels(live_metrics) - - # Update progress - dashboard.update_benchmark_progress_bars(0.1) - - # Update plots - dashboard.update_scatter_plot_panel(scatter_data) -``` - -#### Custom Dashboard Configuration -```python -import os - -# Force minimal dashboard -os.environ["ENABLE_UI"] = "false" -dashboard = create_dashboard() - -# Force rich dashboard -os.environ["ENABLE_UI"] = "true" -dashboard = create_dashboard() -``` - -#### Real-time Metrics Update -```python -# Live metrics data structure -live_metrics = { - "ttft": [0.1, 0.2, 0.15, 0.3], - "input_throughput": [100, 120, 110, 90], - "output_throughput": [50, 60, 55, 45], - "output_latency": [0.5, 0.6, 0.55, 0.7], - "stats": { - "mean_ttft": 0.1875, - "mean_input_throughput": 105.0, - "mean_output_throughput": 52.5, - "mean_output_latency": 0.5875 - } -} - -# Update dashboard with live data -dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") -dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") -dashboard.update_scatter_plot_panel(live_metrics["ttft"], time_unit="s") -``` - -## User Classes - -### `BaseUser` -Abstract base class for user implementations. - -```python -class BaseUser(HttpUser): - supported_tasks: Dict[str, str] = {} - - @classmethod - def is_task_supported(cls, task: str) -> bool: ... - - def sample(self) -> UserRequest: ... - - def collect_metrics(self, user_response: UserResponse, endpoint: str): ... -``` - -### Provider-Specific User Classes - -- `OpenAIUser` - OpenAI API implementation -- `AWSBedrockUser` - AWS Bedrock implementation -- `AzureOpenAIUser` - Azure OpenAI implementation -- `GCPVertexUser` - GCP Vertex AI implementation -- `OCICohereUser` - OCI Cohere implementation -- `OCIGenAIUser` - OCI GenAI implementation - -## Distributed System - -> **⚑ Learn More**: For distributed benchmarking setup and best practices, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#distributed-benchmark). - -### Distributed Configuration - -#### `DistributedConfig` -Configuration for distributed benchmark execution. - -```python -@dataclass -class DistributedConfig: - num_workers: int - master_host: str = "127.0.0.1" - master_port: int = 5557 - wait_time: int = 2 - pin_to_cores: bool = False - cpu_affinity_map: Optional[Dict[int, int]] = None -``` - -**Configuration Options:** -- `num_workers` - Number of worker processes (0 for local mode) -- `master_host` - Host for master process communication -- `master_port` - Port for master-worker communication -- `wait_time` - Wait time for worker startup -- `pin_to_cores` - Enable CPU core pinning (experimental) -- `cpu_affinity_map` - Custom worker-to-CPU mapping - -### Distributed Runner - -#### `DistributedRunner` -Manages distributed load test execution with master and worker processes. - -```python -class DistributedRunner: - def __init__( - self, - environment: Environment, - config: DistributedConfig, - dashboard: Optional[Dashboard] = None, - ): ... - - def setup(self) -> None: ... - - def update_scenario(self, scenario: str) -> None: ... - - def update_batch_size(self, batch_size: int) -> None: ... - - def cleanup(self) -> None: ... -``` - -**Architecture Overview:** - -1. **Process Model:** - - **Master Process**: Controls test execution and aggregates metrics - - **Worker Processes**: Execute actual API requests and send metrics to master - - **Local Mode**: Single process handles both execution and aggregation - -2. **Message Flow:** - - **Master β†’ Workers:** - - `"update_scenario"`: Updates test scenario configuration - - `"update_batch_size"`: Updates batch size for requests - - **Workers β†’ Master:** - - `"request_metrics"`: Sends metrics from each request for aggregation - - `"worker_log"`: Sends worker logs to master - -3. **Execution Flow:** - - **Master Process:** - - Sets up worker processes - - Controls test scenarios and batch sizes - - Aggregates metrics from workers - - Runs the main benchmark loop - - Updates dashboard with live metrics - - **Worker Processes:** - - Receive test configurations from master - - Execute API requests - - Send metrics back to master - - Do NOT execute the main benchmark loop - -4. **Message Registration:** - - **Master**: registers `"request_metrics"` handler - - **Workers**: register `"update_scenario"`, `"update_batch_size"` handlers - - **Local mode**: registers all handlers - -5. **Metrics Collection:** - - Only master/local maintains `AggregatedMetricsCollector` - - Workers collect individual request metrics and send to master - - Master aggregates metrics and updates dashboard - -### Message Handler Protocol - -#### `MessageHandler` -Protocol for message handling in distributed system. - -```python -class MessageHandler(Protocol): - def __call__(self, environment: Environment, msg: Any, **kwargs) -> None: ... -``` - -### Distributed System Usage Examples - -#### Basic Distributed Setup -```python -from genai_bench.distributed.runner import DistributedRunner, DistributedConfig -from genai_bench.ui.dashboard import create_dashboard - -# Configure distributed execution -config = DistributedConfig( - num_workers=4, - master_host="127.0.0.1", - master_port=5557, - wait_time=2 -) - -# Create dashboard -dashboard = create_dashboard() - -# Create distributed runner -runner = DistributedRunner(environment, config, dashboard) -runner.setup() - -# If worker process, exit after setup -if isinstance(environment.runner, WorkerRunner): - return - -# Master continues with test execution -runner.update_scenario("N(100,50)") -runner.update_batch_size(32) -``` - -#### Advanced Configuration -```python -# CPU-optimized distributed setup -config = DistributedConfig( - num_workers=8, - master_host="0.0.0.0", # Allow external connections - master_port=5557, - wait_time=5, - pin_to_cores=True, - cpu_affinity_map={ - 0: 0, 1: 1, 2: 2, 3: 3, # Worker -> CPU mapping - 4: 4, 5: 5, 6: 6, 7: 7 - } -) - -runner = DistributedRunner(environment, config, dashboard) -runner.setup() -``` - -#### Local vs Distributed Mode -```python -# Local mode (single process) -config = DistributedConfig(num_workers=0) -runner = DistributedRunner(environment, config, dashboard) -runner.setup() - -# Distributed mode (multiple processes) -config = DistributedConfig(num_workers=4) -runner = DistributedRunner(environment, config, dashboard) -runner.setup() -``` - -#### Dynamic Scenario Updates -```python -# Update scenario during execution -runner.update_scenario("D(200,100)") # Deterministic scenario -runner.update_scenario("N(150,75)") # Normal distribution -runner.update_scenario("U(50,250)") # Uniform distribution - -# Update batch size for embedding tasks -runner.update_batch_size(16) -runner.update_batch_size(32) -``` - -#### Cleanup and Resource Management -```python -# Automatic cleanup on exit -import atexit -atexit.register(runner.cleanup) - -# Manual cleanup -runner.cleanup() -``` - -### Performance Considerations - -#### Worker Process Optimization -- **CPU Pinning**: Pin workers to specific CPU cores for better performance -- **Process Count**: Balance between CPU cores and memory usage -- **Memory Management**: Monitor memory usage with high worker counts - -#### Network Configuration -- **Master Host**: Use `0.0.0.0` for external worker connections -- **Port Selection**: Choose non-conflicting ports for multiple instances -- **Wait Time**: Adjust based on worker startup time - -#### Resource Monitoring -```python -import psutil - -# Monitor system resources -cpu_count = multiprocessing.cpu_count() -memory_gb = psutil.virtual_memory().total / (1024**3) - -# Recommended worker count -recommended_workers = min(cpu_count * 2, 16) -``` - -## Metrics and Analysis - -> **πŸ“Š Learn More**: For metrics definitions and analysis examples, see the [Metrics Definition Guide](../getting-started/metrics-definition.md) and [Excel Reports Guide](../user-guide/generate-excel-sheet.md). - -### Metrics Collection Components - -#### `RequestLevelMetrics` -Metrics for individual requests with comprehensive tracking. - -```python -class RequestLevelMetrics(BaseModel): - ttft: Optional[float] = Field(None, description="Time to first token (TTFT)") - tpot: Optional[float] = Field(None, description="Time per output token (TPOT)") - e2e_latency: Optional[float] = Field(None, description="End-to-end latency") - output_latency: Optional[float] = Field(None, description="Output latency") - output_inference_speed: Optional[float] = Field( - None, description="Output inference speed in tokens/s" - ) - num_input_tokens: Optional[int] = Field(None, description="Number of input tokens") - num_output_tokens: Optional[int] = Field( - None, description="Number of output tokens" - ) - total_tokens: Optional[int] = Field(None, description="Total tokens processed") - input_throughput: Optional[float] = Field( - None, description="Input throughput in tokens/s" - ) - output_throughput: Optional[float] = Field( - None, description="Output throughput in tokens/s" - ) - error_code: Optional[int] = Field(None, description="Error code") - error_message: Optional[str] = Field(None, description="Error message") -``` - -#### `MetricStats` -Statistical analysis for individual metrics. - -```python -class MetricStats(BaseModel): - # Statistical measures for each metric - ttft: MetricStat = Field(default_factory=MetricStat) - tpot: MetricStat = Field(default_factory=MetricStat) - e2e_latency: MetricStat = Field(default_factory=MetricStat) - output_latency: MetricStat = Field(default_factory=MetricStat) - output_inference_speed: MetricStat = Field(default_factory=MetricStat) - num_input_tokens: MetricStat = Field(default_factory=MetricStat) - num_output_tokens: MetricStat = Field(default_factory=MetricStat) - total_tokens: MetricStat = Field(default_factory=MetricStat) - input_throughput: MetricStat = Field(default_factory=MetricStat) - output_throughput: MetricStat = Field(default_factory=MetricStat) -``` - -#### `AggregatedMetrics` -Comprehensive aggregated metrics across multiple requests. - -```python -class AggregatedMetrics(BaseModel): - # Run Metadata - scenario: Optional[str] = Field(None, description="The sample scenario") - num_concurrency: int = Field(1, description="Number of concurrency") - batch_size: int = Field(1, description="Batch size for embedding tasks") - iteration_type: str = Field( - "num_concurrency", - description="Type of iteration used (num_concurrency or batch_size)", - ) - - # Performance Metrics - run_duration: float = Field(0.0, description="Run duration in seconds.") - mean_output_throughput_tokens_per_s: float = Field(0.0, description="Mean output throughput") - mean_input_throughput_tokens_per_s: float = Field(0.0, description="Mean input throughput") - mean_total_tokens_throughput_tokens_per_s: float = Field(0.0, description="Mean total throughput") - mean_total_chars_per_hour: float = Field(0.0, description="Mean chars per hour") - requests_per_second: float = Field(0.0, description="Average requests per second") - - # Error Tracking - error_codes_frequency: Dict[int, int] = Field(default_factory=dict, description="Error code frequency") - error_rate: float = Field(0.0, description="Error rate across all requests") - num_error_requests: int = Field(0, description="Number of error requests") - num_completed_requests: int = Field(0, description="Number of completed requests") - num_requests: int = Field(0, description="Number of total requests") - - # Statistical Analysis - stats: MetricStats = Field(default_factory=MetricStats, description="Statistical analysis") -``` - -### Metrics Collectors - -#### `RequestMetricsCollector` -Collects and calculates metrics for individual requests. - -```python -class RequestMetricsCollector: - def __init__(self): ... - - def calculate_metrics(self, response: UserResponse): ... -``` - -**Features:** -- Automatic metric calculation from response data -- Error handling and validation -- Support for different response types -- Token counting and throughput calculation - -#### `AggregatedMetricsCollector` -Advanced metrics aggregation with statistical analysis. - -```python -class AggregatedMetricsCollector: - def __init__(self): ... - - def add_single_request_metrics(self, metrics: RequestLevelMetrics): ... - - def aggregate_metrics_data( - self, - start_time: float, - end_time: float, - dataset_character_to_token_ratio: float, - warmup_ratio: Optional[float], - cooldown_ratio: Optional[float], - ): ... - - def get_live_metrics_data(self) -> LiveMetricsData: ... -``` - -**Features:** -- Real-time metrics aggregation -- Statistical analysis (percentiles, means, std dev) -- Warmup and cooldown period filtering -- Live metrics data generation -- Error rate calculation - -### Time Unit Conversion - -#### `TimeUnitConverter` -Converts metrics between different time units. - -```python -class TimeUnitConverter: - @staticmethod - def convert_time_unit(value: float, from_unit: str, to_unit: str) -> float: ... - - @staticmethod - def convert_throughput_unit(value: float, from_unit: str, to_unit: str) -> float: ... -``` - -**Supported Units:** -- Time: `s` (seconds), `ms` (milliseconds), `ΞΌs` (microseconds) -- Throughput: `tokens/s`, `tokens/ms`, `tokens/ΞΌs` - -### Live Metrics System - -#### `LiveMetricsData` -Real-time metrics data structure. - -```python -LiveMetricsData = { - "ttft": List[float], - "input_throughput": List[float], - "output_throughput": List[float], - "output_latency": List[float], - "stats": Dict[str, Any] -} -``` - -### Metrics Usage Examples - -#### Basic Metrics Collection -```python -from genai_bench.metrics.request_metrics_collector import RequestMetricsCollector -from genai_bench.metrics.aggregated_metrics_collector import AggregatedMetricsCollector - -# Collect individual request metrics -request_collector = RequestMetricsCollector() -request_collector.calculate_metrics(user_response) - -# Aggregate metrics across multiple requests -aggregated_collector = AggregatedMetricsCollector() - -# Add individual request metrics -for request_metrics in request_metrics_list: - aggregated_collector.add_single_request_metrics(request_metrics) - -# Perform final aggregation -aggregated_collector.aggregate_metrics_data( - start_time=start_time, - end_time=end_time, - dataset_character_to_token_ratio=4.0, - warmup_ratio=0.1, - cooldown_ratio=0.1 -) -``` - -#### Time Unit Conversion -```python -from genai_bench.time_units import TimeUnitConverter - -# Convert latency from seconds to milliseconds -latency_ms = TimeUnitConverter.convert_time_unit( - latency_s, "s", "ms" -) - -# Convert throughput from tokens/s to tokens/ms -throughput_ms = TimeUnitConverter.convert_throughput_unit( - throughput_s, "tokens/s", "tokens/ms" -) -``` - -#### Live Metrics Monitoring -```python -# Get live metrics data -live_metrics = aggregated_collector.get_live_metrics_data() - -# Update dashboard with live data -dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") -dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") -``` - -#### Statistical Analysis -```python -# Access statistical data -stats = aggregated_metrics.stats - -# Get specific metric statistics -ttft_stats = stats.ttft -print(f"TTFT - Mean: {ttft_stats.mean}, P95: {ttft_stats.p95}") - -# Get error analysis -print(f"Error Rate: {aggregated_metrics.error_rate}") -print(f"Error Codes: {aggregated_metrics.error_codes_frequency}") -``` - -## Advanced Data Loading System - -### Dataset Configuration - -#### `DatasetConfig` -Complete dataset configuration with flexible source support. - -```python -class DatasetConfig(BaseModel): - source: DatasetSourceConfig - prompt_column: Optional[str] = None - image_column: Optional[str] = None - prompt_lambda: Optional[str] = None - unsafe_allow_large_images: bool = False - - @classmethod - def from_file(cls, config_path: str) -> "DatasetConfig": ... - - @classmethod - def from_cli_args( - cls, - dataset_path: Optional[str] = None, - prompt_column: Optional[str] = None, - image_column: Optional[str] = None, - **kwargs, - ) -> "DatasetConfig": ... -``` - -#### `DatasetSourceConfig` -Configuration for different dataset sources. - -```python -class DatasetSourceConfig(BaseModel): - type: str = Field(..., description="Dataset source type: 'file', 'huggingface', or 'custom'") - path: Optional[str] = Field(None, description="Path to dataset (file path or HuggingFace ID)") - file_format: Optional[str] = Field(None, description="File format: 'csv', 'txt', 'json'") - huggingface_kwargs: Optional[Dict[str, Any]] = Field( - None, description="Keyword arguments passed directly to HuggingFace load_dataset" - ) - loader_class: Optional[str] = Field(None, description="Python import path for custom dataset loader") - loader_kwargs: Optional[Dict[str, Any]] = Field(None, description="Keyword arguments for custom loader") -``` - -### Dataset Sources - -#### `DatasetSource` -Abstract base class for dataset sources. - -```python -class DatasetSource(ABC): - def __init__(self, config: DatasetSourceConfig): ... - - @abstractmethod - def load(self) -> Any: ... -``` - -#### `FileDatasetSource` -Load datasets from local files (txt, csv, json). - -```python -class FileDatasetSource(DatasetSource): - def load(self) -> Union[List[str], List[Tuple[str, Any]]]: ... - - def _load_text_file(self, file_path: Path) -> List[str]: ... - def _load_csv_file(self, file_path: Path) -> Any: ... - def _load_json_file(self, file_path: Path) -> List[Any]: ... -``` - -#### `HuggingFaceDatasetSource` -Load datasets from HuggingFace Hub. - -```python -class HuggingFaceDatasetSource(DatasetSource): - def load(self) -> Any: ... -``` - -### Data Loaders - -#### `DatasetLoader` -Abstract base class for dataset loaders. - -```python -class DatasetLoader(ABC): - supported_formats: Set[DatasetFormat] = set() - media_type: str = "" - - def __init__(self, dataset_config: DatasetConfig): ... - - def load_request(self) -> Union[List[str], List[Tuple[str, Any]]]: ... -``` - -#### `TextDatasetLoader` -For loading text datasets. - -```python -class TextDatasetLoader(DatasetLoader): - supported_formats = {DatasetFormat.TEXT, DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} - media_type = "text" -``` - -#### `ImageDatasetLoader` -For loading image datasets. - -```python -class ImageDatasetLoader(DatasetLoader): - supported_formats = {DatasetFormat.CSV, DatasetFormat.JSON, DatasetFormat.HUGGINGFACE_HUB} - media_type = "image" -``` - -### Data Loading Factory - -#### `DataLoaderFactory` -Factory for creating data loaders and loading data. - -```python -class DataLoaderFactory: - @staticmethod - def load_data_for_task( - task: str, dataset_config: DatasetConfig - ) -> Union[List[str], List[Tuple[str, Any]]]: ... - - @staticmethod - def _load_text_data( - dataset_config: DatasetConfig, output_modality: str - ) -> List[str]: ... - - @staticmethod - def _load_image_data( - dataset_config: DatasetConfig, - ) -> List[Tuple[str, Any]]: ... -``` - -### Data Loading Usage Examples - -#### File-based Dataset Loading -```python -from genai_bench.data.config import DatasetConfig, DatasetSourceConfig -from genai_bench.data.loaders.factory import DataLoaderFactory - -# Load from CSV file -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="file", - path="/path/to/dataset.csv", - file_format="csv" - ), - prompt_column="text" -) - -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) -``` - -#### HuggingFace Dataset Loading -```python -# Load from HuggingFace Hub -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="huggingface", - path="squad", - huggingface_kwargs={ - "split": "train", - "streaming": True - } - ), - prompt_column="question" -) - -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) -``` - -#### Custom Dataset Loading -```python -# Load with custom loader -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="custom", - loader_class="my_package.CustomLoader", - loader_kwargs={ - "api_key": "your-api-key", - "endpoint": "https://api.example.com/data" - } - ) -) - -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) -``` - -#### Image Dataset Loading -```python -# Load image dataset -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="file", - path="/path/to/images.csv", - file_format="csv" - ), - prompt_column="caption", - image_column="image_path" -) - -data = DataLoaderFactory.load_data_for_task("image-text-to-text", dataset_config) -``` - -#### Configuration from File -```python -# Load configuration from JSON file -dataset_config = DatasetConfig.from_file("/path/to/dataset_config.json") - -# Load data -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) -``` - -#### CLI Integration -```python -# Create configuration from CLI arguments -dataset_config = DatasetConfig.from_cli_args( - dataset_path="/path/to/dataset.csv", - prompt_column="text", - image_column="image" -) - -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) -``` - -## Analysis and Reporting Classes - -### `FlexiblePlotGenerator` -Generates plots using flexible configuration. - -```python -class FlexiblePlotGenerator: - def __init__(self, config: PlotConfig): ... - - def generate_plots( - self, - run_data_list: List[Tuple[ExperimentMetadata, ExperimentMetrics]], - group_key: str, - experiment_folder: str, - metrics_time_unit: str = "s" - ) -> None: ... -``` - -### `PlotConfig` -Configuration for plot generation. - -```python -class PlotConfig(BaseModel): - title: str - plots: List[PlotSpec] - figure_size: Tuple[int, int] = (12, 8) - dpi: int = 100 - # ... more configuration options -``` - -### `ExperimentLoader` -Loads experiment data from files. - -```python -def load_multiple_experiments( - folder_name: str, - filter_criteria=None -) -> List[Tuple[ExperimentMetadata, ExperimentMetrics]]: ... - -def load_one_experiment( - folder_name: str, - filter_criteria: Optional[Dict[str, Any]] = None -) -> Tuple[Optional[ExperimentMetadata], ExperimentMetrics]: ... -``` - -## Configuration Classes - -> **βš™οΈ Learn More**: For configuration examples and best practices, see the [Run Benchmark Guide](../user-guide/run-benchmark.md#selecting-datasets). - -### `DatasetConfig` -Configuration for dataset loading. - -```python -class DatasetConfig(BaseModel): - source: DatasetSourceConfig - prompt_column: Optional[str] = None - image_column: Optional[str] = None - unsafe_allow_large_images: bool = False -``` - -### `DatasetSourceConfig` -Configuration for dataset sources. - -```python -class DatasetSourceConfig(BaseModel): - type: Literal["file", "huggingface", "custom"] - path: Optional[str] = None - file_format: Optional[str] = None - huggingface_dataset: Optional[str] = None - huggingface_config: Optional[str] = None - huggingface_split: Optional[str] = None - loader_class: Optional[str] = None - loader_kwargs: Optional[Dict[str, Any]] = None -``` - -## Comprehensive Examples - -> **πŸš€ Learn More**: For step-by-step tutorials and practical examples, see the [User Guide](../user-guide/index.md). - -### Complete Multi-Cloud Benchmarking Setup - -#### End-to-End Benchmarking Pipeline -```python -import os -from genai_bench.auth.unified_factory import UnifiedAuthFactory -from genai_bench.storage.factory import StorageFactory -from genai_bench.distributed.runner import DistributedRunner, DistributedConfig -from genai_bench.ui.dashboard import create_dashboard -from genai_bench.data.config import DatasetConfig, DatasetSourceConfig -from genai_bench.data.loaders.factory import DataLoaderFactory - -# 1. Configure Authentication -model_auth = UnifiedAuthFactory.create_model_auth( - "openai", - api_key=os.getenv("OPENAI_API_KEY") -) - -storage_auth = UnifiedAuthFactory.create_storage_auth( - "aws", - access_key_id=os.getenv("AWS_ACCESS_KEY_ID"), - secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"), - region="us-east-1" -) - -# 2. Create Storage -storage = StorageFactory.create_storage("aws", storage_auth) - -# 3. Configure Dataset -dataset_config = DatasetConfig( - source=DatasetSourceConfig( - type="huggingface", - path="squad", - huggingface_kwargs={"split": "train", "streaming": True} - ), - prompt_column="question" -) - -# 4. Load Data -data = DataLoaderFactory.load_data_for_task("text-to-text", dataset_config) - -# 5. Configure Distributed Execution -config = DistributedConfig( - num_workers=4, - master_host="127.0.0.1", - master_port=5557 -) - -# 6. Create Dashboard -dashboard = create_dashboard(metrics_time_unit="s") - -# 7. Run Benchmark -runner = DistributedRunner(environment, config, dashboard) -runner.setup() - -# Upload results -storage.upload_folder( - "/path/to/results", - "my-bucket", - prefix="benchmarks/2024/" -) -``` - -#### Multi-Provider Authentication Setup -```python -# OpenAI + AWS S3 -openai_auth = UnifiedAuthFactory.create_model_auth("openai", api_key="sk-...") -aws_storage_auth = UnifiedAuthFactory.create_storage_auth("aws", profile="default") - -# Azure OpenAI + Azure Blob -azure_auth = UnifiedAuthFactory.create_model_auth( - "azure-openai", - endpoint="https://your-resource.openai.azure.com/", - deployment="your-deployment", - api_key="your-api-key" -) -azure_storage_auth = UnifiedAuthFactory.create_storage_auth( - "azure", - account_name="your-storage-account", - account_key="your-account-key" -) - -# GCP Vertex + GCP Storage -gcp_auth = UnifiedAuthFactory.create_model_auth( - "gcp-vertex", - project_id="your-project", - location="us-central1" -) -gcp_storage_auth = UnifiedAuthFactory.create_storage_auth( - "gcp", - project_id="your-project" -) - -# OCI GenAI + OCI Object Storage -oci_auth = UnifiedAuthFactory.create_model_auth( - "oci", - config_path="~/.oci/config", - profile="DEFAULT" -) -oci_storage_auth = UnifiedAuthFactory.create_storage_auth( - "oci", - config_path="~/.oci/config", - profile="DEFAULT" -) -``` - -#### Advanced Distributed Configuration -```python -# High-performance distributed setup -config = DistributedConfig( - num_workers=8, - master_host="0.0.0.0", # Allow external connections - master_port=5557, - wait_time=5, - pin_to_cores=True, - cpu_affinity_map={ - 0: 0, 1: 1, 2: 2, 3: 3, - 4: 4, 5: 5, 6: 6, 7: 7 - } -) - -# Create runner with custom dashboard -dashboard = create_dashboard(metrics_time_unit="ms") -runner = DistributedRunner(environment, config, dashboard) -runner.setup() - -# Dynamic scenario updates -scenarios = ["N(100,50)", "N(200,100)", "D(150,150)", "U(50,250)"] -for scenario in scenarios: - runner.update_scenario(scenario) - # Run benchmark with this scenario - # ... benchmark execution ... -``` - -#### Custom Dataset Loading Examples -```python -# Text dataset from CSV -text_config = DatasetConfig( - source=DatasetSourceConfig( - type="file", - path="/path/to/text_data.csv", - file_format="csv" - ), - prompt_column="text" -) - -# Image dataset from JSON -image_config = DatasetConfig( - source=DatasetSourceConfig( - type="file", - path="/path/to/images.json", - file_format="json" - ), - prompt_column="caption", - image_column="image_path" -) - -# HuggingFace dataset with custom parameters -hf_config = DatasetConfig( - source=DatasetSourceConfig( - type="huggingface", - path="squad", - huggingface_kwargs={ - "split": "train", - "streaming": True, - "cache_dir": "/tmp/hf_cache" - } - ), - prompt_column="question" -) - -# Custom dataset loader -custom_config = DatasetConfig( - source=DatasetSourceConfig( - type="custom", - loader_class="my_package.CustomDataLoader", - loader_kwargs={ - "api_endpoint": "https://api.example.com/data", - "api_key": "your-api-key", - "batch_size": 1000 - } - ) -) - -# Load data for different tasks -text_data = DataLoaderFactory.load_data_for_task("text-to-text", text_config) -image_data = DataLoaderFactory.load_data_for_task("image-text-to-text", image_config) -hf_data = DataLoaderFactory.load_data_for_task("text-to-embeddings", hf_config) -custom_data = DataLoaderFactory.load_data_for_task("text-to-text", custom_config) -``` - -#### Advanced Storage Operations -```python -# Multi-cloud backup -providers = ["aws", "azure", "gcp", "oci"] -storages = [] - -for provider in providers: - auth = UnifiedAuthFactory.create_storage_auth(provider, **provider_configs[provider]) - storage = StorageFactory.create_storage(provider, auth) - storages.append(storage) - -# Upload to all providers -for storage in storages: - storage.upload_folder( - local_folder="/path/to/results", - bucket="benchmark-results", - prefix="backup/2024/" - ) - -# Advanced upload with metadata -storage.upload_file( - local_path="/path/to/results.json", - remote_path="benchmarks/2024/results.json", - bucket="my-bucket", - metadata={ - "experiment": "llm-benchmark", - "model": "gpt-4", - "version": "1.0", - "timestamp": "2024-01-01T00:00:00Z" - }, - encryption="AES256" -) - -# List and filter objects -for obj in storage.list_objects( - bucket="my-bucket", - prefix="benchmarks/2024/", - max_keys=100 -): - if obj.endswith(".json"): - print(f"Found result file: {obj}") -``` - -#### Custom Plot Generation -```python -from genai_bench.analysis.flexible_plot_report import FlexiblePlotGenerator -from genai_bench.analysis.plot_config import PlotConfig, PlotSpec - -# Create comprehensive plot configuration -config = PlotConfig( - title="LLM Performance Analysis", - plots=[ - PlotSpec( - x_field="concurrency", - y_fields=["e2e_latency", "ttft"], - plot_type="line", - title="Latency vs Concurrency", - x_label="Concurrency Level", - y_label="Latency (ms)" - ), - PlotSpec( - x_field="concurrency", - y_fields=["input_throughput", "output_throughput"], - plot_type="bar", - title="Throughput vs Concurrency", - x_label="Concurrency Level", - y_label="Throughput (tokens/s)" - ), - PlotSpec( - x_field="input_throughput", - y_fields=["e2e_latency"], - plot_type="scatter", - title="Latency vs Input Throughput", - x_label="Input Throughput (tokens/s)", - y_label="Latency (ms)" - ) - ], - figure_size=(15, 10), - dpi=300 -) - -# Generate plots -generator = FlexiblePlotGenerator(config) -generator.generate_plots( - run_data_list, - group_key="traffic_scenario", - experiment_folder="/path/to/results", - metrics_time_unit="ms" -) -``` - -#### Metrics Analysis and Monitoring -```python -from genai_bench.metrics.aggregated_metrics_collector import AggregatedMetricsCollector -from genai_bench.analysis.experiment_loader import load_multiple_experiments - -# Load experiment data -experiments = load_multiple_experiments( - folder_name="/path/to/experiments", - filter_criteria={"model": "gpt-4", "task": "text-to-text"} -) - -# Analyze metrics -for metadata, metrics in experiments: - print(f"Experiment: {metadata.experiment_folder_name}") - print(f"Model: {metadata.model}") - print(f"Task: {metadata.task}") - print(f"Concurrency: {metadata.num_concurrency}") - - # Performance metrics - print(f"Mean TTFT: {metrics.stats.ttft.mean:.3f}ms") - print(f"P95 TTFT: {metrics.stats.ttft.p95:.3f}ms") - print(f"Mean Throughput: {metrics.mean_output_throughput_tokens_per_s:.2f} tokens/s") - print(f"Error Rate: {metrics.error_rate:.2%}") - - # Error analysis - if metrics.error_codes_frequency: - print("Error Codes:") - for code, count in metrics.error_codes_frequency.items(): - print(f" {code}: {count} occurrences") -``` - -#### Dashboard Customization -```python -import os - -# Force rich dashboard for development -os.environ["ENABLE_UI"] = "true" -dashboard = create_dashboard(metrics_time_unit="s") - -# Force minimal dashboard for production -os.environ["ENABLE_UI"] = "false" -dashboard = create_dashboard(metrics_time_unit="ms") - -# Custom dashboard usage -with dashboard.live: - # Update with live metrics - live_metrics = { - "ttft": [0.1, 0.2, 0.15, 0.3], - "input_throughput": [100, 120, 110, 90], - "output_throughput": [50, 60, 55, 45], - "output_latency": [0.5, 0.6, 0.55, 0.7], - "stats": { - "mean_ttft": 0.1875, - "mean_input_throughput": 105.0, - "mean_output_throughput": 52.5, - "mean_output_latency": 0.5875 - } - } - - dashboard.update_metrics_panels(live_metrics, metrics_time_unit="s") - dashboard.update_histogram_panel(live_metrics, metrics_time_unit="s") - dashboard.update_scatter_plot_panel(live_metrics["ttft"], time_unit="s") -``` - -#### Time Unit Conversion Examples -```python -from genai_bench.time_units import TimeUnitConverter - -# Convert latency metrics -latency_s = 0.5 -latency_ms = TimeUnitConverter.convert_time_unit(latency_s, "s", "ms") -latency_us = TimeUnitConverter.convert_time_unit(latency_s, "s", "ΞΌs") - -print(f"Latency: {latency_s}s = {latency_ms}ms = {latency_us}ΞΌs") - -# Convert throughput metrics -throughput_s = 100.0 # tokens/s -throughput_ms = TimeUnitConverter.convert_throughput_unit(throughput_s, "tokens/s", "tokens/ms") -throughput_us = TimeUnitConverter.convert_throughput_unit(throughput_s, "tokens/s", "tokens/ΞΌs") - -print(f"Throughput: {throughput_s} tokens/s = {throughput_ms} tokens/ms = {throughput_us} tokens/ΞΌs") -``` - -#### Error Handling and Recovery -```python -import logging -from genai_bench.logging import init_logger - -logger = init_logger(__name__) - -try: - # Create authentication - auth = UnifiedAuthFactory.create_model_auth("openai", api_key="invalid-key") -except ValueError as e: - logger.error(f"Authentication failed: {e}") - # Fallback to different provider - auth = UnifiedAuthFactory.create_model_auth("azure-openai", **azure_config) - -try: - # Create storage - storage = StorageFactory.create_storage("aws", storage_auth) -except Exception as e: - logger.error(f"Storage creation failed: {e}") - # Fallback to local storage - storage = None - -# Handle distributed runner errors -try: - runner = DistributedRunner(environment, config, dashboard) - runner.setup() -except Exception as e: - logger.error(f"Distributed setup failed: {e}") - # Fallback to local mode - config.num_workers = 0 - runner = DistributedRunner(environment, config, dashboard) - runner.setup() -``` - -### Production Deployment Examples - -#### Docker-based Deployment -```dockerfile -FROM python:3.9-slim - -# Install dependencies -RUN pip install genai-bench[all] - -# Set environment variables -ENV ENABLE_UI=false -ENV TOKENIZERS_PARALLELISM=false - -# Copy configuration -COPY config/ /app/config/ -COPY data/ /app/data/ - -# Run benchmark -CMD ["genai-bench", "benchmark", "--config", "/app/config/benchmark.yaml"] -``` - -#### Kubernetes Deployment -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: genai-bench -spec: - replicas: 1 - selector: - matchLabels: - app: genai-bench - template: - metadata: - labels: - app: genai-bench - spec: - containers: - - name: genai-bench - image: genai-bench:latest - env: - - name: ENABLE_UI - value: "false" - - name: OPENAI_API_KEY - valueFrom: - secretKeyRef: - name: api-keys - key: openai-key - resources: - requests: - memory: "2Gi" - cpu: "1000m" - limits: - memory: "4Gi" - cpu: "2000m" -``` - -#### CI/CD Integration -```yaml -name: LLM Benchmarking -on: - schedule: - - cron: '0 2 * * *' # Daily at 2 AM - -jobs: - benchmark: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - - name: Setup Python - uses: actions/setup-python@v4 - with: - python-version: '3.9' - - - name: Install dependencies - run: | - pip install genai-bench[all] - - - name: Run benchmark - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - run: | - genai-bench benchmark \ - --api-backend openai \ - --model gpt-4 \ - --task text-to-text \ - --traffic-scenario "N(100,50)" \ - --num-concurrency 1,2,4,8 \ - --max-time-per-run 300 \ - --upload-results \ - --storage-provider aws \ - --storage-bucket benchmark-results -``` - -## Logging and Utilities - -### Logging System - -#### `LoggingManager` -Centralized logging management for the application. - -```python -class LoggingManager: - def __init__(self): ... - - def setup_logging(self, level: str = "INFO"): ... - - def get_logger(self, name: str) -> logging.Logger: ... -``` - -#### `WorkerLoggingManager` -Specialized logging for distributed worker processes. - -```python -class WorkerLoggingManager: - def __init__(self): ... - - def setup_worker_logging(self, worker_id: int): ... - - def send_log_to_master(self, message: str, level: str): ... -``` - -#### `init_logger` -Initialize logger for a specific module. - -```python -def init_logger(name: str) -> logging.Logger: ... -``` - -### Utility Functions - -#### `calculate_sonnet_char_token_ratio` -Calculate character-to-token ratio for Sonnet model. - -```python -def calculate_sonnet_char_token_ratio() -> float: ... -``` - -#### `sanitize_string` -Sanitize string for safe usage in file paths and identifiers. - -```python -def sanitize_string(text: str) -> str: ... -``` - -### Logging Usage Examples - -#### Basic Logging Setup -```python -from genai_bench.logging import init_logger - -# Initialize logger for your module -logger = init_logger(__name__) - -# Use logger -logger.info("Starting benchmark") -logger.warning("High memory usage detected") -logger.error("Authentication failed") -``` - -#### Distributed Logging -```python -from genai_bench.logging import WorkerLoggingManager - -# Setup worker logging -worker_logger = WorkerLoggingManager() -worker_logger.setup_worker_logging(worker_id=0) - -# Send logs to master -worker_logger.send_log_to_master("Worker started", "INFO") -worker_logger.send_log_to_master("Processing request", "DEBUG") -``` - -#### Utility Function Usage -```python -from genai_bench.utils import calculate_sonnet_char_token_ratio, sanitize_string - -# Calculate token ratio -ratio = calculate_sonnet_char_token_ratio() -print(f"Sonnet char/token ratio: {ratio}") - -# Sanitize strings -safe_name = sanitize_string("My Experiment (v1.0)") -print(f"Sanitized: {safe_name}") -``` - -## CLI System Enhancements - -### Option Groups - -#### API Options -```python -api_options = [ - click.option("--api-backend", required=True, help="API backend"), - click.option("--api-base", help="API base URL"), - click.option("--api-key", help="API key"), - click.option("--model", required=True, help="Model name"), - click.option("--task", required=True, help="Task type") -] -``` - -#### Authentication Options -```python -model_auth_options = [ - click.option("--model-auth-type", help="Model authentication type"), - click.option("--aws-access-key-id", help="AWS access key"), - click.option("--aws-secret-access-key", help="AWS secret key"), - click.option("--azure-endpoint", help="Azure endpoint"), - click.option("--gcp-project-id", help="GCP project ID") -] - -storage_auth_options = [ - click.option("--storage-provider", help="Storage provider"), - click.option("--storage-bucket", help="Storage bucket"), - click.option("--storage-prefix", help="Storage prefix") -] -``` - -#### Distributed Options -```python -distributed_locust_options = [ - click.option("--num-workers", default=0, help="Number of worker processes"), - click.option("--master-port", default=5557, help="Master port"), - click.option("--spawn-rate", default=1, help="Spawn rate") -] -``` - -### Validation Functions - -#### `validate_tokenizer` -Validate tokenizer configuration. - -```python -def validate_tokenizer(tokenizer_name: str, model: str) -> bool: ... -``` - -### CLI Usage Examples - -#### Basic CLI Usage -```bash -# Run benchmark with OpenAI -genai-bench benchmark \ - --api-backend openai \ - --api-key $OPENAI_KEY \ - --model gpt-4 \ - --task text-to-text \ - --traffic-scenario "N(100,50)" \ - --num-concurrency 1,2,4,8 - -# Run with Azure OpenAI -genai-bench benchmark \ - --api-backend azure-openai \ - --azure-endpoint https://your-resource.openai.azure.com/ \ - --azure-deployment your-deployment \ - --model gpt-4 \ - --task text-to-text - -# Run with distributed workers -genai-bench benchmark \ - --api-backend openai \ - --model gpt-4 \ - --task text-to-text \ - --num-workers 4 \ - --master-port 5557 -``` - -#### Advanced CLI Usage -```bash -# Multi-cloud setup -genai-bench benchmark \ - --api-backend openai \ - --model gpt-4 \ - --task text-to-text \ - --upload-results \ - --storage-provider aws \ - --storage-bucket my-bucket \ - --storage-prefix benchmarks/2024 - -# Custom dataset -genai-bench benchmark \ - --api-backend openai \ - --model gpt-4 \ - --task text-to-text \ - --dataset-path /path/to/dataset.csv \ - --dataset-prompt-column text - -# HuggingFace dataset -genai-bench benchmark \ - --api-backend openai \ - --model gpt-4 \ - --task text-to-text \ - --dataset-config /path/to/dataset_config.json -``` - -## Contributing to API Documentation - -We welcome contributions to improve our API documentation! If you'd like to help: - -1. **Add docstrings** to undocumented functions and classes -2. **Provide usage examples** for complex components -3. **Document edge cases** and common gotchas -4. **Update examples** with new features and best practices -5. **Add troubleshooting sections** for common issues -6. **Submit a pull request** with your improvements - -### Documentation Guidelines - -- **Code Examples**: Include complete, runnable examples -- **Error Handling**: Show how to handle common errors -- **Best Practices**: Highlight recommended usage patterns -- **Cross-References**: Link related components and concepts -- **Version Compatibility**: Note any version-specific features - -### Areas Needing Documentation - -- **Custom Authentication Providers**: How to implement custom auth -- **Custom Storage Providers**: How to add new storage backends -- **Custom Dataset Loaders**: How to create custom data sources -- **Performance Tuning**: Optimization strategies and tips -- **Troubleshooting**: Common issues and solutions - -See our [Contributing Guide](../development/contributing.md) for more details on how to contribute to the project. - -## Troubleshooting and Support - -### Common Issues - -- **Authentication Problems**: See the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md) for detailed setup instructions -- **Performance Issues**: Check the [Distributed Benchmarking Guide](../user-guide/run-benchmark.md#distributed-benchmark) for optimization tips -- **Dataset Loading**: Refer to the [Dataset Configuration Examples](../user-guide/run-benchmark.md#selecting-datasets) for proper setup -- **Storage Upload**: See the [Upload Results Guide](../user-guide/upload-benchmark-result.md) for troubleshooting storage issues - -### Additional Resources - -- **[Development Guide](../development/index.md)** - Contributing and development setup -- **[Multi-Cloud Quick Reference](../user-guide/multi-cloud-quick-reference.md)** - Quick setup reference for all providers - -### Getting Help - -If you encounter issues not covered in the documentation: - -1. Check the [GitHub Issues](https://github.com/sgl-project/genai-bench/issues) for known problems -2. Review the [Multi-Cloud Authentication Guide](../user-guide/multi-cloud-auth-storage.md) for provider-specific issues -3. Consult the [Run Benchmark Guide](../user-guide/run-benchmark.md) for usage examples -4. Open a new issue with detailed error information and configuration \ No newline at end of file diff --git a/docs/development/adding-new-features.md b/docs/development/adding-new-features.md new file mode 100644 index 00000000..9efa8ae9 --- /dev/null +++ b/docs/development/adding-new-features.md @@ -0,0 +1,126 @@ +# Adding New Features + +This guide covers how to add new features to GenAI Bench, including model providers, storage providers, and tasks. + +## Adding a New Model Provider + +1. Create auth provider in `genai_bench/auth/` +2. Create user class in `genai_bench/user/` +3. Update `UnifiedAuthFactory` +4. Add validation in `cli/validation.py` +5. Write tests + +## Adding a New Storage Provider + +1. Create storage auth in `genai_bench/auth/` +2. Create storage implementation in `genai_bench/storage/` +3. Update `StorageFactory` +4. Write tests + +## Adding a New Task + +This guide explains how to add support for a new task in `genai-bench`. Follow the steps below to ensure consistency and compatibility with the existing codebase. + +### 1. Define the Request and Response in `protocol.py` + +#### Steps + +1. Add relevant fields to the appropriate request/response data classes in [`protocol.py`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/protocol.py) +2. If the new task involves a new input-output modality, create a new request/response class. +3. Use existing request/response classes (`UserChatRequest`, `UserEmbeddingRequest`, `UserImageChatRequest`, etc.) if they suffice. + +#### Example + +```python +class UserTextToImageRequest(UserRequest): + """Represents a request for generating images from text.""" + prompt: str + num_images: int = Field(..., description="Number of images to generate.") + image_resolution: Tuple[int, int] = Field(..., description="Resolution of the generated images.") +``` + +### 2. Update or Create a Sampler + +#### 2.1 If Input Modality Is Supported by an Existing Sampler + +1. Check if the current [`TextSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/text_sampler.py) or [`ImageSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/image_sampler.py) supports the input-modality. +2. Add request creation logic in the relevant `TextSampler` or `ImageSampler` class. +3. Refactor the sampler's `_create_request` method to support the new task. +4. **Tip:** Avoid adding long `if-else` chains for new tasks. Utilize helper methods or design a request creator pattern if needed. + +#### 2.2 If Input Modality Is Not Supported + +1. Create a new sampler class inheriting from [`BaseSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/base_sampler.py). +2. Define the `sample` method to generate requests for the new task. +3. Refer to `TextSampler` and `ImageSampler` for implementation patterns. +4. Add utility functions for data preprocessing or validation specific to the new modality if necessary. + +#### Example for a New Sampler + +```python +class AudioSampler(Sampler): + input_modality = "audio" + supported_tasks = {"audio-to-text", "audio-to-embeddings"} + + def sample(self, scenario: Scenario) -> UserRequest: + # Validate scenario + self._validate_scenario(scenario) + + if self.output_modality == "text": + return self._create_audio_to_text_request(scenario) + elif self.output_modality == "embeddings": + return self._create_audio_to_embeddings_request(scenario) + else: + raise ValueError(f"Unsupported output_modality: {self.output_modality}") +``` + +### 3. Add Task Support in the User Class + +Each `User` corresponds to one API backend, such as [`OpenAIUser`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/user/openai_user.py) for OpenAI. Users can have multiple tasks, each corresponding to an endpoint. + +#### Steps + +1. Add the new task to the `supported_tasks` dictionary in the relevant `User` class. +2. Map the new task to its corresponding function name in the dictionary. +3. Implement the new function in the `User` class for handling the task logic. +4. If the new task uses an existing endpoint, refactor the function to support both tasks without duplicating logic. +5. **Important:** Avoid creating multiple functions for tasks that use the same endpoint. + +#### Example + +```python +class OpenAIUser(BaseUser): + supported_tasks = { + "text-to-text": "chat", + "image-text-to-text": "chat", + "text-to-embeddings": "embeddings", + "audio-to-text": "audio_to_text", # New task added + } + + def audio_to_text(self): + # Implement the logic for audio-to-text task + endpoint = "/v1/audio/transcriptions" + user_request = self.sample() + + # Add payload and send request + payload = {"audio": user_request.audio_file} + self.send_request(False, endpoint, payload, self.parse_audio_response) +``` + +### 4. Add Unit Tests + +#### Steps + +1. Add tests for the new task in the appropriate test files. +2. Include tests for: + - Request creation in the sampler. + - Task validation in the `User` class. + - End-to-end workflow using the new task. + +### 5. Update Documentation + +#### Steps + +1. Add the new task to the list of supported tasks in the [Task Definition guide](../getting-started/task-definition.md). +2. Provide sample commands and explain any required configuration changes. +3. Mention the new task in this contributing guide for future developers. diff --git a/docs/development/api-reference.md b/docs/development/api-reference.md new file mode 100644 index 00000000..d63fcb31 --- /dev/null +++ b/docs/development/api-reference.md @@ -0,0 +1,203 @@ +# API Reference + +This section provides detailed API documentation for GenAI Bench components. + +!!! info "Coming Soon" + Comprehensive API documentation is being developed. In the meantime, please refer to the source code docstrings. + +## Core Components + +### Authentication + +- **UnifiedAuthFactory** - Factory for creating authentication providers +- **ModelAuthProvider** - Base class for model authentication +- **StorageAuthProvider** - Base class for storage authentication + +### Storage + +- **BaseStorage** - Abstract base class for storage implementations +- **StorageFactory** - Factory for creating storage providers + +### CLI + +- **option_groups** - Modular CLI option definitions +- **validation** - Input validation functions + +### Metrics + +- **AggregatedMetricsCollector** - Collects and aggregates benchmark metrics +- **RequestMetricsCollector** - Collects per-request metrics + +### Data Loading + +- **DatasetConfig** - Configuration for dataset loading +- **DatasetSourceConfig** - Configuration for dataset sources +- **DataLoaderFactory** - Factory for loading datasets +- **TextDatasetLoader** - Text dataset loader +- **ImageDatasetLoader** - Image dataset loader + +### Benchmarking + +- **DistributedRunner** - Distributed benchmark execution with multiple workers +- **DistributedConfig** - Configuration for distributed runs +- **BaseUser** - Abstract base class for user implementations +- **OpenAIUser** - OpenAI API implementation +- **AWSBedrockUser** - AWS Bedrock implementation +- **AzureOpenAIUser** - Azure OpenAI implementation +- **GCPVertexUser** - GCP Vertex AI implementation +- **OCICohereUser** - OCI Cohere implementation + +### Analysis + +- **PlotConfig** - Configuration for visualizations +- **ExperimentLoader** - Loading experiment results +- **FlexiblePlotGenerator** - Generate plots with flexible configuration + +## Example Usage + +### Creating an Authentication Provider + +```python +from genai_bench.auth.unified_factory import UnifiedAuthFactory + +# Create OpenAI auth +auth = UnifiedAuthFactory.create_model_auth( + "openai", + api_key="sk-..." +) + +# Create AWS Bedrock auth +auth = UnifiedAuthFactory.create_model_auth( + "aws-bedrock", + access_key_id="AKIA...", + secret_access_key="...", + region="us-east-1" +) +``` + +### Creating a Storage Provider + +```python +from genai_bench.auth.unified_factory import UnifiedAuthFactory +from genai_bench.storage.factory import StorageFactory + +# Create storage auth +storage_auth = UnifiedAuthFactory.create_storage_auth( + "aws", + profile="default", + region="us-east-1" +) + +# Create storage instance +storage = StorageFactory.create_storage( + "aws", + storage_auth +) + +# Upload a folder +storage.upload_folder( + "/path/to/results", + "my-bucket", + prefix="benchmarks/2024" +) +``` + +### Loading Datasets + +```python +from genai_bench.data.config import DatasetConfig, DatasetSourceConfig +from genai_bench.data.loaders.factory import DataLoaderFactory + +# Load from HuggingFace Hub +config = DatasetConfig( + source=DatasetSourceConfig( + type="huggingface", + path="squad", + huggingface_kwargs={"split": "train"} + ), + prompt_column="question" +) +data = DataLoaderFactory.load_data_for_task("text-to-text", config) + +# Load from local CSV file +config = DatasetConfig( + source=DatasetSourceConfig( + type="file", + path="/path/to/dataset.csv", + file_format="csv" + ), + prompt_column="text" +) +data = DataLoaderFactory.load_data_for_task("text-to-text", config) +``` + +### Running Programmatic Benchmarks + +```python +from genai_bench.distributed.runner import DistributedRunner, DistributedConfig +from genai_bench.ui.dashboard import create_dashboard + +# Configure distributed execution +config = DistributedConfig( + num_workers=4, + master_host="127.0.0.1", + master_port=5557 +) + +# Create dashboard +dashboard = create_dashboard(metrics_time_unit="s") + +# Create and setup runner +runner = DistributedRunner(environment, config, dashboard) +runner.setup() + +# Update scenario and run benchmark +runner.update_scenario("N(100,50)") +runner.update_batch_size(32) +``` + +### Analyzing Results + +```python +from genai_bench.analysis.experiment_loader import load_multiple_experiments +from genai_bench.analysis.flexible_plot_report import FlexiblePlotGenerator +from genai_bench.analysis.plot_config import PlotConfig, PlotSpec + +# Load experiment data +experiments = load_multiple_experiments( + folder_name="/path/to/experiments", + filter_criteria={"model": "gpt-4"} +) + +# Create plot configuration +config = PlotConfig( + title="Performance Analysis", + plots=[ + PlotSpec( + x_field="concurrency", + y_fields=["e2e_latency", "ttft"], + plot_type="line", + title="Latency vs Concurrency" + ) + ] +) + +# Generate plots +generator = FlexiblePlotGenerator(config) +generator.generate_plots( + experiments, + group_key="traffic_scenario", + experiment_folder="/path/to/results" +) +``` + +## Contributing to API Documentation + +We welcome contributions to improve our API documentation! If you'd like to help: + +1. Add docstrings to undocumented functions +2. Provide usage examples +3. Document edge cases and gotchas +4. Submit a pull request + +See our [Contributing Guide](contributing.md) for more details. diff --git a/docs/development/contributing.md b/docs/development/contributing.md index 6dde24eb..066ffea4 100644 --- a/docs/development/contributing.md +++ b/docs/development/contributing.md @@ -68,120 +68,3 @@ On your remote machine, you can simply use the `pip` to install genai-bench. pip install /<.wheel> ``` -# Development Guide: Adding a New Task in `genai-bench` - -This guide explains how to add support for a new task in `genai-bench`. Follow the steps below to ensure consistency and compatibility with the existing codebase. - ---- - -## 1. Define the Request and Response in `protocol.py` - -### Steps - -1. Add relevant fields to the appropriate request/response data classes in [`protocol.py`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/protocol.py) -2. If the new task involves a new input-output modality, create a new request/response class. -3. Use existing request/response classes (`UserChatRequest`, `UserEmbeddingRequest`, `UserImageChatRequest`, etc.) if they suffice. - -### Example - -```python -class UserTextToImageRequest(UserRequest): - """Represents a request for generating images from text.""" - prompt: str - num_images: int = Field(..., description="Number of images to generate.") - image_resolution: Tuple[int, int] = Field(..., description="Resolution of the generated images.") -``` - ---- - -## 2. Update or Create a Sampler - -### 2.1 If Input Modality Is Supported by an Existing Sampler - -1. Check if the current [`TextSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/text_sampler.py) or [`ImageSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/image_sampler.py) supports the input-modality. -2. Add request creation logic in the relevant `TextSampler` or `ImageSampler` class. -3. Refactor the sampler's `_create_request` method to support the new task. -4. **Tip:** Avoid adding long `if-else` chains for new tasks. Utilize helper methods or design a request creator pattern if needed. - -### 2.2 If Input Modality Is Not Supported - -1. Create a new sampler class inheriting from [`BaseSampler`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/sampling/base_sampler.py). -2. Define the `sample` method to generate requests for the new task. -3. Refer to `TextSampler` and `ImageSampler` for implementation patterns. -4. Add utility functions for data preprocessing or validation specific to the new modality if necessary. - -### Example for a New Sampler - -```python -class AudioSampler(Sampler): - input_modality = "audio" - supported_tasks = {"audio-to-text", "audio-to-embeddings"} - - def sample(self, scenario: Scenario) -> UserRequest: - # Validate scenario - self._validate_scenario(scenario) - - if self.output_modality == "text": - return self._create_audio_to_text_request(scenario) - elif self.output_modality == "embeddings": - return self._create_audio_to_embeddings_request(scenario) - else: - raise ValueError(f"Unsupported output_modality: {self.output_modality}") -``` - ---- - -## 3. Add Task Support in the User Class - -Each `User` corresponds to one API backend, such as [`OpenAIUser`](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/user/openai_user.py) for OpenAI. Users can have multiple tasks, each corresponding to an endpoint. - -### Steps - -1. Add the new task to the `supported_tasks` dictionary in the relevant `User` class. -2. Map the new task to its corresponding function name in the dictionary. -3. Implement the new function in the `User` class for handling the task logic. -4. If the new task uses an existing endpoint, refactor the function to support both tasks without duplicating logic. -5. **Important:** Avoid creating multiple functions for tasks that use the same endpoint. - -### Example - -```python -class OpenAIUser(BaseUser): - supported_tasks = { - "text-to-text": "chat", - "image-text-to-text": "chat", - "text-to-embeddings": "embeddings", - "audio-to-text": "audio_to_text", # New task added - } - - def audio_to_text(self): - # Implement the logic for audio-to-text task - endpoint = "/v1/audio/transcriptions" - user_request = self.sample() - - # Add payload and send request - payload = {"audio": user_request.audio_file} - self.send_request(False, endpoint, payload, self.parse_audio_response) -``` - ---- - -## 4. Add Unit Tests - -### Steps - -1. Add tests for the new task in the appropriate test files. -2. Include tests for: - - Request creation in the sampler. - - Task validation in the `User` class. - - End-to-end workflow using the new task. - ---- - -## 5. Update Documentation - -### Steps - -1. Add the new task to the list of supported tasks in the [Task Definition guide](../getting-started/task-definition.md). -2. Provide sample commands and explain any required configuration changes. -3. Mention the new task in this contributing guide for future developers. diff --git a/docs/development/index.md b/docs/development/index.md index d7d0e920..69aa7b78 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -14,6 +14,14 @@ Welcome to the GenAI Bench development guide! This section covers everything you [:octicons-arrow-right-24: Contributing Guide](contributing.md) +- :material-cog:{ .lg .middle } **Adding New Features** + + --- + + Learn how to add new providers and tasks + + [:octicons-arrow-right-24: Adding New Features](adding-new-features.md) + ## Development Setup @@ -96,22 +104,6 @@ genai-bench/ - Modular option groups - Comprehensive validation -## Adding New Features - -### Adding a New Model Provider - -1. Create auth provider in `genai_bench/auth/` -2. Create user class in `genai_bench/user/` -3. Update `UnifiedAuthFactory` -4. Add validation in `cli/validation.py` -5. Write tests - -### Adding a New Storage Provider - -1. Create storage auth in `genai_bench/auth/` -2. Create storage implementation in `genai_bench/storage/` -3. Update `StorageFactory` -4. Write tests ## Testing diff --git a/docs/getting-started/command-guidelines.md b/docs/getting-started/command-guidelines.md index 692e5b1c..06de2475 100644 --- a/docs/getting-started/command-guidelines.md +++ b/docs/getting-started/command-guidelines.md @@ -145,4 +145,4 @@ genai-bench excel --help genai-bench plot --help ``` -For further information, refer to the [User Guide](../user-guide/index.md) and the [API Reference](../api/index.md). You can also look at [option_groups.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/option_groups.py) directly. \ No newline at end of file +For further information, refer to the [User Guide](../user-guide/index.md) and the [API Reference](../development/api-reference.md). You can also look at [option_groups.py](https://github.com/sgl-project/genai-bench/blob/main/genai_bench/cli/option_groups.py) directly. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 6e66b6fb..4d8fb785 100644 --- a/docs/index.md +++ b/docs/index.md @@ -82,7 +82,7 @@ GenAI Bench supports multiple benchmark types: ### πŸ“š API Reference -- [API Documentation](api/index.md) - Complete API reference and code examples +- [API Documentation](development/api-reference.md) - Complete API reference and code examples ## Support From 77b445374ec54535dbf55d981d283eb146bf5d9e Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Fri, 24 Oct 2025 10:51:25 -0700 Subject: [PATCH 08/10] clean up development section --- docs/.config/mkdocs.yml | 1 - docs/development/adding-new-features.md | 2 +- docs/development/api-reference.md | 2 +- docs/development/contributing.md | 70 -------------- docs/development/index.md | 122 +++++++++++++++--------- docs/index.md | 2 +- 6 files changed, 80 insertions(+), 119 deletions(-) delete mode 100644 docs/development/contributing.md diff --git a/docs/.config/mkdocs.yml b/docs/.config/mkdocs.yml index 46aa6f69..a1babf80 100644 --- a/docs/.config/mkdocs.yml +++ b/docs/.config/mkdocs.yml @@ -130,6 +130,5 @@ nav: - Upload Results: user-guide/upload-benchmark-result.md - Development: - development/index.md - - Contributing: development/contributing.md - Adding New Features: development/adding-new-features.md - API Reference: development/api-reference.md diff --git a/docs/development/adding-new-features.md b/docs/development/adding-new-features.md index 9efa8ae9..af2ba5ae 100644 --- a/docs/development/adding-new-features.md +++ b/docs/development/adding-new-features.md @@ -123,4 +123,4 @@ class OpenAIUser(BaseUser): 1. Add the new task to the list of supported tasks in the [Task Definition guide](../getting-started/task-definition.md). 2. Provide sample commands and explain any required configuration changes. -3. Mention the new task in this contributing guide for future developers. +3. Mention the new task in this development guide for future developers. diff --git a/docs/development/api-reference.md b/docs/development/api-reference.md index d63fcb31..f4f612a9 100644 --- a/docs/development/api-reference.md +++ b/docs/development/api-reference.md @@ -200,4 +200,4 @@ We welcome contributions to improve our API documentation! If you'd like to help 3. Document edge cases and gotchas 4. Submit a pull request -See our [Contributing Guide](contributing.md) for more details. +See our [Development Guide](index.md) for more details. diff --git a/docs/development/contributing.md b/docs/development/contributing.md deleted file mode 100644 index 066ffea4..00000000 --- a/docs/development/contributing.md +++ /dev/null @@ -1,70 +0,0 @@ -# Contribution Guideline - -Welcome and thank you for your interest in contributing to genai-bench. - -## Coding Style Guide - -genai-bench uses python 3.11, and we adhere to [Google Python style guide](https://google.github.io/styleguide/pyguide.html). - -We use `make format` to format our code using `isort` and `ruff`. The detailed configuration can be found in -[pyproject.toml](https://github.com/sgl-project/genai-bench/blob/main/pyproject.toml). - -## Pull Requests - -Please follow the PR template, which will be automatically populated when you open a new [Pull Request on GitHub](https://github.com/sgl-project/genai-bench/compare). - -### Code Reviews - -All submissions, including submissions by project members, require a code review. -To make the review process as smooth as possible, please: - -1. Keep your changes as concise as possible. - If your pull request involves multiple unrelated changes, consider splitting it into separate pull requests. -2. Respond to all comments within a reasonable time frame. - If a comment isn't clear, - or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion. -3. Provide constructive feedback and meaningful comments. Focus on specific improvements - and suggestions that can enhance the code quality or functionality. Remember to - acknowledge and respect the work the author has already put into the submission. - - -## Setup Development Environment - -### `make` - -genai-bench utilizes `make` for a lot of useful commands. - -If your laptop doesn't have `GNU make` installed, (check this by typing `make --version` in your terminal), -you can ask our GenerativeAI's chatbot about how to install it in your system. - -### `uv` - -Install uv with `make uv` or install it from the [official website](https://docs.astral.sh/uv/). -If installing from the website, create a project venv with `uv venv -p python3.11`. - -Once you have `make` and `uv` installed, you can follow the command below to build genai-bench wheel: - -```shell -# check out commands genai-bench supports -make help -#activate virtual env managed by uv -source .venv/bin/activate -# install dependencies -make install -``` - -You can utilize wheel to install genai-bench. - -```shell -# build a .whl under genai-bench/dist -make build -# send the wheel to your remote machine if applies -rsync --delete -avz ~/genai-bench/dist/<.wheel> @: -``` - -On your remote machine, you can simply use the `pip` to install genai-bench. - -```shell -pip install /<.wheel> -``` - diff --git a/docs/development/index.md b/docs/development/index.md index 69aa7b78..55ea8bbc 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -1,34 +1,68 @@ # Development -Welcome to the GenAI Bench development guide! This section covers everything you need to contribute to the project. +Welcome and thank you for your interest in contributing to genai-bench! This section is a development guide that covers everything you need to contribute to the project. ## Getting Started with Development
-- :material-source-pull:{ .lg .middle } **Contributing** +- :material-cog:{ .lg .middle } **Adding New Features** --- - Learn how to contribute to GenAI Bench + Learn how to add new providers and tasks - [:octicons-arrow-right-24: Contributing Guide](contributing.md) + [:octicons-arrow-right-24: Adding New Features](adding-new-features.md) -- :material-cog:{ .lg .middle } **Adding New Features** +- :material-book:{ .lg .middle } **API Reference** --- - Learn how to add new providers and tasks + Programmatic usage and integration - [:octicons-arrow-right-24: Adding New Features](adding-new-features.md) + [:octicons-arrow-right-24: API Reference](api-reference.md)
+ +## Coding Style Guide + +genai-bench uses python 3.11, and we adhere to [Google Python style guide](https://google.github.io/styleguide/pyguide.html). + +We use `make format` to format our code using `isort` and `ruff`. The detailed configuration can be found in +[pyproject.toml](https://github.com/sgl-project/genai-bench/blob/main/pyproject.toml). + +### Guidelines + +- Follow PEP 8 +- Use type hints +- Write docstrings for public APIs +- Keep functions focused and small +- Add tests for new features + +## Pull Requests + +Please follow the PR template, which will be automatically populated when you open a new [Pull Request on GitHub](https://github.com/sgl-project/genai-bench/compare). + +### Code Reviews + +All submissions, including submissions by project members, require a code review. +To make the review process as smooth as possible, please: + +1. Keep your changes as concise as possible. + If your pull request involves multiple unrelated changes, consider splitting it into separate pull requests. +2. Respond to all comments within a reasonable time frame. + If a comment isn't clear, + or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion. +3. Provide constructive feedback and meaningful comments. Focus on specific improvements + and suggestions that can enhance the code quality or functionality. Remember to + acknowledge and respect the work the author has already put into the submission. + ## Development Setup ### Prerequisites -- Python 3.8+ +- Python 3.11 - Git - Make (optional but recommended) @@ -39,17 +73,44 @@ git clone https://github.com/sgl-project/genai-bench.git cd genai-bench ``` -### Create a Virtual Environment +### Development Environment Setup -```bash -python -m venv venv -source venv/bin/activate # On Windows: venv\Scripts\activate +#### `make` + +genai-bench utilizes `make` for a lot of useful commands. + +If your laptop doesn't have `GNU make` installed, (check this by typing `make --version` in your terminal), +you can ask our GenerativeAI's chatbot about how to install it in your system. + +#### `uv` + +Install uv with `make uv` or install it from the [official website](https://docs.astral.sh/uv/). +If installing from the website, create a project venv with `uv venv -p python3.11`. + +Once you have `make` and `uv` installed, you can follow the command below to build genai-bench wheel: + +```shell +# check out commands genai-bench supports +make help +#activate virtual env managed by uv +source .venv/bin/activate +# install dependencies +make install ``` -### Install in Development Mode +You can utilize wheel to install genai-bench. -```bash -pip install -e ".[dev]" +```shell +# build a .whl under genai-bench/dist +make build +# send the wheel to your remote machine if applies +rsync --delete -avz ~/genai-bench/dist/<.wheel> @: +``` + +On your remote machine, you can simply use the `pip` to install genai-bench. + +```shell +pip install /<.wheel> ``` ### Run Tests @@ -84,27 +145,6 @@ genai-bench/ └── docs/ # Documentation ``` -## Key Components - -### Authentication System - -- Unified factory for creating auth providers -- Support for multiple cloud providers -- Extensible architecture for new providers - -### Storage System - -- Abstract base class for storage providers -- Implementations for AWS S3, Azure Blob, GCP Cloud Storage, etc. -- Consistent interface across providers - -### CLI Architecture - -- Click-based command structure -- Modular option groups -- Comprehensive validation - - ## Testing We use pytest for testing: @@ -138,16 +178,8 @@ make docs-serve make docs-build ``` -## Code Style - -- Follow PEP 8 -- Use type hints -- Write docstrings for public APIs -- Keep functions focused and small -- Add tests for new features - ## Questions? - +- Check out the [Adding New Features](./adding-new-features.md) and [API Reference](./api-reference.md) pages for more information on the project - Open an issue on GitHub - Join our community discussions - Check existing issues and PRs \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index 4d8fb785..1c18b764 100644 --- a/docs/index.md +++ b/docs/index.md @@ -78,7 +78,7 @@ GenAI Bench supports multiple benchmark types: ### πŸ”§ Development -- [Contributing](development/contributing.md) - How to contribute to GenAI Bench +- [Development](development/index.md) - How to contribute to GenAI Bench ### πŸ“š API Reference From 8e3e6facc7023d8138cb5e0f492efe99780d0d16 Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Fri, 24 Oct 2025 11:25:50 -0700 Subject: [PATCH 09/10] update api reference --- docs/development/api-reference.md | 363 +++++++++++++++++++++++++++--- docs/development/index.md | 22 +- 2 files changed, 340 insertions(+), 45 deletions(-) diff --git a/docs/development/api-reference.md b/docs/development/api-reference.md index f4f612a9..614725e5 100644 --- a/docs/development/api-reference.md +++ b/docs/development/api-reference.md @@ -1,57 +1,346 @@ # API Reference -This section provides detailed API documentation for GenAI Bench components. +This section provides comprehensive API documentation for all GenAI Bench components, organized by functional category. -!!! info "Coming Soon" - Comprehensive API documentation is being developed. In the meantime, please refer to the source code docstrings. +## Project Structure -## Core Components +``` +genai-bench/ +β”œβ”€β”€ genai_bench/ # Main package +β”‚ β”œβ”€β”€ analysis/ # Result analysis and reporting +β”‚ β”œβ”€β”€ auth/ # Authentication providers +β”‚ β”œβ”€β”€ cli/ # CLI implementation +β”‚ β”œβ”€β”€ data/ # Dataset loading and management +β”‚ β”œβ”€β”€ distributed/ # Distributed execution +β”‚ β”œβ”€β”€ metrics/ # Metrics collection +β”‚ β”œβ”€β”€ sampling/ # Data sampling +β”‚ β”œβ”€β”€ scenarios/ # Traffic generation scenarios +β”‚ β”œβ”€β”€ storage/ # Storage providers +β”‚ β”œβ”€β”€ ui/ # User interface components +β”‚ └── user/ # User implementations +β”œβ”€β”€ tests/ # Test suite +└── docs/ # Documentation +``` -### Authentication +## Analysis -- **UnifiedAuthFactory** - Factory for creating authentication providers -- **ModelAuthProvider** - Base class for model authentication -- **StorageAuthProvider** - Base class for storage authentication +Components for analyzing benchmark results and generating reports. -### Storage +### Data Loading -- **BaseStorage** - Abstract base class for storage implementations -- **StorageFactory** - Factory for creating storage providers +- **`ExperimentLoader`** - Loads experiment data from files +- **`load_multiple_experiments()`** - Loads multiple experiment results +- **`load_one_experiment()`** - Loads single experiment result -### CLI +### Plot Generation -- **option_groups** - Modular CLI option definitions -- **validation** - Input validation functions +- **`FlexiblePlotGenerator`** - Generates plots using flexible configuration +- **`plot_experiment_data_flexible()`** - Generates flexible plots -### Metrics +### Configuration -- **AggregatedMetricsCollector** - Collects and aggregates benchmark metrics -- **RequestMetricsCollector** - Collects per-request metrics +- **`PlotConfig`** - Configuration for plot generation +- **`PlotConfigManager`** - Manages plot configurations +- **`PlotSpec`** - Specification for individual plots -### Data Loading +### Report Generation + +- **`create_workbook()`** - Creates Excel workbooks from experiment data + +### Data Types + +- **`ExperimentMetrics`** - Metrics data structure for experiments +- **`MetricsData`** - Union type for aggregated or individual metrics + +## Authentication + +Components for handling authentication across different cloud providers and services. + +### Base Classes + +- **`AuthProvider`** - Base class for authentication providers + +### Factories + +- **`UnifiedAuthFactory`** - Unified factory for creating authentication providers +- **`AuthFactory`** - Legacy factory for authentication providers + +### Model Authentication Providers + +- **`ModelAuthProvider`** - Base class for model endpoint authentication +- **`OpenAIAuth`** - OpenAI API authentication +- **`AWSBedrockAuth`** - AWS Bedrock authentication +- **`AzureOpenAIAuth`** - Azure OpenAI authentication +- **`GCPVertexAuth`** - GCP Vertex AI authentication +- **`OCIModelAuthAdapter`** - OCI model authentication adapter + +### Storage Authentication Providers + +- **`StorageAuthProvider`** - Base class for storage authentication +- **`AWSS3Auth`** - AWS S3 authentication +- **`AzureBlobAuth`** - Azure Blob Storage authentication +- **`GCPStorageAuth`** - GCP Cloud Storage authentication +- **`GitHubAuth`** - GitHub authentication +- **`OCIStorageAuthAdapter`** - OCI storage authentication adapter + +### OCI Authentication Providers + +- **`OCIUserPrincipalAuth`** - OCI user principal authentication +- **`OCIInstancePrincipalAuth`** - OCI instance principal authentication +- **`OCISessionAuth`** - OCI session authentication +- **`OCIOBOTokenAuth`** - OCI on-behalf-of token authentication + +## Storage + +Components for multi-cloud storage operations. + +### Base Classes + +- **`BaseStorage`** - Abstract base class for storage providers +- **`StorageFactory`** - Factory for creating storage providers + +### Storage Implementations + +- **`AWSS3Storage`** - AWS S3 storage implementation +- **`AzureBlobStorage`** - Azure Blob Storage implementation +- **`GCPCloudStorage`** - GCP Cloud Storage implementation +- **`OCIObjectStorage`** - OCI Object Storage implementation +- **`GitHubStorage`** - GitHub storage implementation + +### OCI Object Storage Components + +- **`DataStore`** - Interface for data store operations +- **`OSDataStore`** - OCI Object Storage data store +- **`ObjectURI`** - Object URI representation + +### Operations + +- **File Operations**: `upload_file`, `download_file`, `delete_object` +- **Folder Operations**: `upload_folder` +- **Listing**: `list_objects` +- **Multi-cloud Support**: AWS, Azure, GCP, OCI, GitHub + +## CLI + +Command-line interface components for user interaction. + +### Commands + +- **`cli`** - Main CLI entry point +- **`benchmark`** - Benchmark command +- **`excel`** - Excel report generation command +- **`plot`** - Plot generation command + +### Option Groups + +- **`api_options`** - API-related CLI options +- **`model_auth_options`** - Model authentication options +- **`storage_auth_options`** - Storage authentication options +- **`distributed_locust_options`** - Distributed execution options +- **`experiment_options`** - Experiment configuration options +- **`sampling_options`** - Data sampling options +- **`server_options`** - Server configuration options +- **`object_storage_options`** - Object storage options +- **`oci_auth_options`** - OCI-specific authentication options + +### Utilities + +- **`get_experiment_path()`** - Get experiment file paths +- **`get_run_params()`** - Extract run parameters +- **`manage_run_time()`** - Manage run time limits +- **`validate_tokenizer()`** - Validate tokenizer configuration + +### Validation + +- **`validate_api_backend()`** - Validate API backend selection +- **`validate_api_key()`** - Validate API keys +- **`validate_task()`** - Validate task selection +- **`validate_dataset_config()`** - Validate dataset configuration +- **`validate_additional_request_params()`** - Validate request parameters + +## Data + +Components for loading and managing datasets. + +### Configuration + +- **`DatasetConfig`** - Configuration for dataset loading +- **`DatasetSourceConfig`** - Configuration for dataset sources + +### Loaders + +- **`DatasetLoader`** - Abstract base class for dataset loaders +- **`TextDatasetLoader`** - Text dataset loader +- **`ImageDatasetLoader`** - Image dataset loader +- **`DataLoaderFactory`** - Factory for creating data loaders + +### Sources + +- **`DatasetSource`** - Abstract base class for dataset sources +- **`FileDatasetSource`** - Local file dataset source +- **`HuggingFaceDatasetSource`** - HuggingFace Hub dataset source +- **`CustomDatasetSource`** - Custom dataset source +- **`DatasetSourceFactory`** - Factory for creating dataset sources + +## Distributed + +Components for distributed benchmark execution. + +### Core Components + +- **`DistributedRunner`** - Manages distributed load test execution +- **`DistributedConfig`** - Configuration for distributed runs +- **`MessageHandler`** - Protocol for message handling + +### Architecture Features + +- Master-worker architecture +- Message passing between processes +- Metrics aggregation +- Process management and cleanup + +## Metrics + +Components for collecting and analyzing performance metrics. + +### Data Structures + +- **`RequestLevelMetrics`** - Metrics for individual requests +- **`AggregatedMetrics`** - Aggregated metrics for entire runs +- **`MetricStats`** - Statistical metrics (mean, std, percentiles) + +### Collectors + +- **`AggregatedMetricsCollector`** - Collects and aggregates metrics +- **`RequestMetricsCollector`** - Collects per-request metrics + +### Metric Types + +- **Time Metrics**: TTFT (Time to First Token), TPOT (Time Per Output Token), E2E Latency +- **Throughput Metrics**: Input/Output throughput in tokens/second +- **Token Metrics**: Input/output token counts +- **Error Metrics**: Error rates and codes +- **Performance Metrics**: Requests per second, run duration + +## Sampling + +Components for sampling data and creating requests. + +### Base Classes + +- **`Sampler`** - Abstract base class for samplers + +### Sampler Implementations + +- **`TextSampler`** - Sampler for text-based tasks +- **`ImageSampler`** - Sampler for image-based tasks + +### Supported Tasks + +- **Text Tasks**: text-to-text, text-to-embeddings, text-to-rerank +- **Image Tasks**: image-text-to-text, image-to-embeddings + +### Features + +- Automatic task registry +- Modality-based sampling +- Dataset integration +- Request generation + +## Scenarios + +Components for defining traffic generation scenarios. + +### Base Classes + +- **`Scenario`** - Abstract base class for scenarios + +### Scenario Implementations + +- **`DatasetScenario`** - Dataset-based scenario +- **`NormalDistribution`** - Normal distribution scenario +- **`DeterministicDistribution`** - Deterministic scenario +- **`EmbeddingScenario`** - Embedding-specific scenario +- **`ReRankScenario`** - Re-ranking scenario +- **`ImageModality`** - Image modality scenario + +### Distribution Types + +- **`TextDistribution`** - NORMAL, DETERMINISTIC, UNIFORM +- **`EmbeddingDistribution`** - Embedding-specific distributions +- **`ReRankDistribution`** - Re-ranking distributions +- **`MultiModality`** - Multi-modal scenarios + +### Features + +- String-based scenario parsing +- Automatic scenario registry +- Parameter validation +- Distribution sampling + +## UI + +Components for user interface and visualization. + +### Dashboard Implementations + +- **`Dashboard`** - Union type for dashboard implementations +- **`RichLiveDashboard`** - Rich library-based dashboard +- **`MinimalDashboard`** - Minimal dashboard for non-UI scenarios + +### Layout Functions + +- **`create_layout()`** - Creates dashboard layout +- **`create_metric_panel()`** - Creates metric display panels +- **`create_progress_bars()`** - Creates progress tracking bars + +### Visualization Functions + +- **`create_horizontal_colored_bar_chart()`** - Creates histogram charts +- **`create_scatter_plot()`** - Creates scatter plots +- **`update_progress()`** - Updates progress displays + +### Features + +- Real-time metrics visualization +- Progress tracking +- Interactive charts and histograms +- Configurable UI components + +## User + +Components for interacting with different model APIs. + +### Base Classes + +- **`BaseUser`** - Abstract base class for user implementations + +### User Implementations + +- **`OpenAIUser`** - OpenAI API user +- **`AWSBedrockUser`** - AWS Bedrock user +- **`AzureOpenAIUser`** - Azure OpenAI user +- **`GCPVertexUser`** - GCP Vertex AI user +- **`OCICohereUser`** - OCI Cohere user +- **`OCIGenAIUser`** - OCI Generative AI user +- **`CohereUser`** - Cohere API user -- **DatasetConfig** - Configuration for dataset loading -- **DatasetSourceConfig** - Configuration for dataset sources -- **DataLoaderFactory** - Factory for loading datasets -- **TextDatasetLoader** - Text dataset loader -- **ImageDatasetLoader** - Image dataset loader +### Supported Tasks -### Benchmarking +Each user implementation supports different combinations of: -- **DistributedRunner** - Distributed benchmark execution with multiple workers -- **DistributedConfig** - Configuration for distributed runs -- **BaseUser** - Abstract base class for user implementations -- **OpenAIUser** - OpenAI API implementation -- **AWSBedrockUser** - AWS Bedrock implementation -- **AzureOpenAIUser** - Azure OpenAI implementation -- **GCPVertexUser** - GCP Vertex AI implementation -- **OCICohereUser** - OCI Cohere implementation +- **text-to-text**: Chat and generation tasks +- **image-text-to-text**: Vision-based chat tasks +- **text-to-embeddings**: Text embedding generation +- **image-to-embeddings**: Image embedding generation +- **text-to-rerank**: Text re-ranking tasks -### Analysis +### Features -- **PlotConfig** - Configuration for visualizations -- **ExperimentLoader** - Loading experiment results -- **FlexiblePlotGenerator** - Generate plots with flexible configuration +- Task-based request handling +- Metrics collection +- Error handling +- Authentication integration ## Example Usage @@ -200,4 +489,4 @@ We welcome contributions to improve our API documentation! If you'd like to help 3. Document edge cases and gotchas 4. Submit a pull request -See our [Development Guide](index.md) for more details. +See our [Development Guide](index.md) for more details. \ No newline at end of file diff --git a/docs/development/index.md b/docs/development/index.md index 55ea8bbc..e8a82765 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -135,14 +135,20 @@ make lint ``` genai-bench/ -β”œβ”€β”€ genai_bench/ # Main package -β”‚ β”œβ”€β”€ auth/ # Authentication providers -β”‚ β”œβ”€β”€ cli/ # CLI implementation -β”‚ β”œβ”€β”€ metrics/ # Metrics collection -β”‚ β”œβ”€β”€ storage/ # Storage providers -β”‚ └── user/ # User implementations -β”œβ”€β”€ tests/ # Test suite -└── docs/ # Documentation +β”œβ”€β”€ genai_bench/ # Main package +β”‚ β”œβ”€β”€ analysis/ # Result analysis and reporting +β”‚ β”œβ”€β”€ auth/ # Authentication providers +β”‚ β”œβ”€β”€ cli/ # CLI implementation +β”‚ β”œβ”€β”€ data/ # Dataset loading and management +β”‚ β”œβ”€β”€ distributed/ # Distributed execution +β”‚ β”œβ”€β”€ metrics/ # Metrics collection +β”‚ β”œβ”€β”€ sampling/ # Data sampling +β”‚ β”œβ”€β”€ scenarios/ # Traffic generation scenarios +β”‚ β”œβ”€β”€ storage/ # Storage providers +β”‚ β”œβ”€β”€ ui/ # User interface components +β”‚ └── user/ # User implementations +β”œβ”€β”€ tests/ # Test suite +└── docs/ # Documentation ``` ## Testing From e9fe08c71febd44fb87e85bbefe90bae0a064c3b Mon Sep 17 00:00:00 2001 From: Tejesh Anand Date: Fri, 24 Oct 2025 13:15:32 -0700 Subject: [PATCH 10/10] update readme and move examples in command guidelines --- README.md | 29 +++++++- docs/getting-started/command-guidelines.md | 80 +++++++++++----------- 2 files changed, 68 insertions(+), 41 deletions(-) diff --git a/README.md b/README.md index bafd75b1..5014aa1e 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,34 @@ Alternatively, check [Installation Guide](https://docs.sglang.ai/genai-bench/get ## How to use -Please check [User Guide](https://docs.sglang.ai/genai-bench/user-guide/) for instructions on using genai-bench. +### Quick Start + +1. **Run a benchmark** against your model: + ```bash + genai-bench benchmark --api-backend openai \ + --api-base "http://localhost:8080" \ + --api-key "your-api-key" \ + --api-model-name "your-model" \ + --task text-to-text \ + --max-time-per-run 5 \ + --max-requests-per-run 100 + ``` + +2. **Generate Excel reports** from your results: + ```bash + genai-bench excel --experiment-folder ./experiments/your_experiment \ + --excel-name results --metric-percentile mean + ``` + +3. **Create visualizations**: + ```bash + genai-bench plot --experiments-folder ./experiments \ + --group-key traffic_scenario --preset 2x4_default + ``` + +### Next Steps + +For detailed instructions, advanced configuration options, and comprehensive examples, check out the [User Guide](https://docs.sglang.ai/genai-bench/user-guide/). ## Development diff --git a/docs/getting-started/command-guidelines.md b/docs/getting-started/command-guidelines.md index 06de2475..4e76b8ba 100644 --- a/docs/getting-started/command-guidelines.md +++ b/docs/getting-started/command-guidelines.md @@ -15,6 +15,23 @@ Commands: The `benchmark` command runs performance tests against AI models. It's the core command for executing benchmarks. +### Example Usage +```bash +# Start a chat benchmark +genai-bench benchmark --api-backend openai \ + --api-base "http://localhost:8082" \ + --api-key "your-openai-api-key" \ + --api-model-name "meta-llama/Meta-Llama-3-70B-Instruct" \ + --model-tokenizer "/mnt/data/models/Meta-Llama-3.1-70B-Instruct" \ + --task text-to-text \ + --max-time-per-run 15 \ + --max-requests-per-run 300 \ + --server-engine "SGLang" \ + --server-gpu-type "H100" \ + --server-version "v0.6.0" \ + --server-gpu-count 4 +``` + ### Essential Options #### **API Configuration** @@ -48,35 +65,12 @@ The `benchmark` command runs performance tests against AI models. It's the core - `--server-gpu-type` - GPU type (H100, A100-80G, etc.) - `--server-gpu-count` - Number of GPUs -### Example Usage -```bash -# Start a chat benchmark -genai-bench benchmark --api-backend openai \ - --api-base "http://localhost:8082" \ - --api-key "your-openai-api-key" \ - --api-model-name "meta-llama/Meta-Llama-3-70B-Instruct" \ - --model-tokenizer "/mnt/data/models/Meta-Llama-3.1-70B-Instruct" \ - --task text-to-text \ - --max-time-per-run 15 \ - --max-requests-per-run 300 \ - --server-engine "SGLang" \ - --server-gpu-type "H100" \ - --server-version "v0.6.0" \ - --server-gpu-count 4 -``` For more information and examples, check out [Run Benchmark](../user-guide/run-benchmark.md). ## Excel The `excel` command exports experiment results to Excel spreadsheets for detailed analysis. -### Essential Options - -- `--experiment-folder` - Path to experiment results folder (required) -- `--excel-name` - Name for the output Excel file (required) -- `--metric-percentile` - Statistical percentile (mean, p25, p50, p75, p90, p95, p99) to select from -- `--metrics-time-unit [s|ms]` - Time unit to use when showing latency metrics in the spreadsheet. Defaults to seconds - ### Example Usage ```bash @@ -95,26 +89,16 @@ genai-bench excel \ --metrics-time-unit ms ``` -## Plot - -The `plot` command generates visualizations from experiment data with flexible configuration options. - ### Essential Options -- `--experiments-folder` - Path to experiments folder, can be more than one experiment (required) -- `--group-key` - Key to group data by (e.g., 'traffic_scenario', 'server_version', 'none') (required) -- `--filter-criteria` - Dictionary of filter criteria -- `--plot-config` - Path to JSON plot configuration file. For more information use [Advanced Plot Configuration](../user-guide/generate-plot.md/#advanced-plot-configuration) -- `--preset` - Built-in plot presets (2x4_default, simple_2x2, multi_line_latency, single_scenario_analysis). Overrides `--plot-config` if both given -- `--metrics-time-unit [s|ms]` - Time unit for latency display, defaults to seconds - -### Advanced Options +- `--experiment-folder` - Path to experiment results folder (required) +- `--excel-name` - Name for the output Excel file (required) +- `--metric-percentile` - Statistical percentile (mean, p25, p50, p75, p90, p95, p99) to select from +- `--metrics-time-unit [s|ms]` - Time unit to use when showing latency metrics in the spreadsheet. Defaults to seconds -- `--list-fields` - List available data fields and exit -- `--validate-only` - Validate configuration without generating plots -- `--verbose` - Enable detailed logging +## Plot -For more information and examples, check out [Generate Plot](../user-guide/generate-plot.md). +The `plot` command generates visualizations from experiment data with flexible configuration options. ### Example Usage @@ -131,9 +115,25 @@ genai-bench plot \ --group-key server_version \ --preset multi_line_latency \ --metrics-time-unit ms - ``` +### Essential Options + +- `--experiments-folder` - Path to experiments folder, can be more than one experiment (required) +- `--group-key` - Key to group data by (e.g., 'traffic_scenario', 'server_version', 'none') (required) +- `--filter-criteria` - Dictionary of filter criteria +- `--plot-config` - Path to JSON plot configuration file. For more information use [Advanced Plot Configuration](../user-guide/generate-plot.md/#advanced-plot-configuration) +- `--preset` - Built-in plot presets (2x4_default, simple_2x2, multi_line_latency, single_scenario_analysis). Overrides `--plot-config` if both given +- `--metrics-time-unit [s|ms]` - Time unit for latency display, defaults to seconds + +### Advanced Options + +- `--list-fields` - List available data fields and exit +- `--validate-only` - Validate configuration without generating plots +- `--verbose` - Enable detailed logging + +For more information and examples, check out [Generate Plot](../user-guide/generate-plot.md). + ## Getting Help For detailed help on any command: