vllm-project · kdelee · Jul 15, 2025 · Jul 15, 2025 · Jul 16, 2025 · Jul 16, 2025
diff --git a/docs/guides/cli.md b/docs/guides/cli.md
@@ -1 +1,36 @@
-# Coming Soon
+# CLI Reference
+
+This page provides a reference for the `guidellm` command-line interface. For more advanced configuration, including environment variables and `.env` files, see the [Configuration Guide](./configuration.md).
+
+## `guidellm benchmark run`
+
+This command is the primary entrypoint for running benchmarks. It has many options that can be specified on the command line or in a scenario file.
+
+### Scenario Configuration
+
+| Option | Description |
+| --- | --- |
+| `--scenario <PATH or NAME>` | The name of a builtin scenario or path to a scenario configuration file. Options specified on the command line will override the scenario file. |
+
+### Target and Backend Configuration
+
+These options configure how `guidellm` connects to the system under test.
+
+| Option | Description |
+| --- | --- |
+| `--target <URL>` | **Required.** The endpoint of the target system, e.g., `http://localhost:8080`. Can also be set with the `GUIDELLM__OPENAI__BASE_URL` environment variable. |
+| `--backend-type <TYPE>` | The type of backend to use. Defaults to `openai_http`. |
+| `--backend-args <JSON>` | A JSON string for backend-specific arguments. For example: `--backend-args '{"headers": {"Authorization": "Bearer my-token"}, "verify": false}'` to pass custom headers and disable certificate verification. |
+| `--model <NAME>` | The ID of the model to benchmark within the backend. |
+
+### Data and Request Configuration
+
+These options define the data to be used for benchmarking and how requests will be generated.
+
+| Option | Description |
+| --- | --- |
+| `--data <SOURCE>` | The data source. This can be a HuggingFace dataset ID, a path to a local data file, or a synthetic data configuration. See the [Data Formats Guide](./data_formats.md) for more details. |
+| `--rate-type <TYPE>` | The type of request generation strategy to use (e.g., `constant`, `poisson`, `sweep`). |
+| `--rate <NUMBER>` | The rate of requests per second for `constant` or `poisson` strategies, or the number of steps for a `sweep`. |
+| `--max-requests <NUMBER>` | The maximum number of requests to run for each benchmark. |
+| `--max-seconds <NUMBER>` | The maximum number of seconds to run each benchmark for. |
diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md
@@ -1 +1,58 @@
-# Coming Soon
+# Configuration
+
+The `guidellm` application can be configured using command-line arguments, environment variables, or a `.env` file. This page details the file-based and environment variable configuration options.
+
+## Configuration Methods
+
+Settings are loaded with the following priority (highest priority first):
+1.  Command-line arguments.
+2.  Environment variables.
+3.  Values in a `.env` file in the directory where the command is run.
+4.  Default values.
+
+## Environment Variable Format
+
+All settings can be configured using environment variables. The variables must be prefixed with `GUIDELLM__`, and nested settings are separated by a double underscore `__`.
+
+For example, to set the `api_key` for the `openai` backend, you would use the following environment variable:
+```bash
+export GUIDELLM__OPENAI__API_KEY="your-api-key"
+```
+
+### Target and Backend Configuration
+
+You can configure the connection to the target system using environment variables. This is an alternative to using the `--target-*` command-line flags.
+
+| Environment Variable | Description | Example |
+| --- | --- | --- |
+| `GUIDELLM__OPENAI__BASE_URL` | The endpoint of the target system. Equivalent to the `--target` CLI option. | `export GUIDELLM__OPENAI__BASE_URL="http://localhost:8080"` |
+| `GUIDELLM__OPENAI__API_KEY` | The API key to use for bearer token authentication. | `export GUIDELLM__OPENAI__API_KEY="your-secret-api-key"` |
+| `GUIDELLM__OPENAI__BEARER_TOKEN` | The full bearer token to use for authentication. | `export GUIDELLM__OPENAI__BEARER_TOKEN="Bearer your-secret-token"` |
+| `GUIDELLM__OPENAI__HEADERS` | A JSON string representing a dictionary of headers to send to the target. These headers will override any default headers. | `export GUIDELLM__OPENAI__HEADERS='{"Authorization": "Bearer my-token"}'` |
+| `GUIDELLM__OPENAI__ORGANIZATION` | The OpenAI organization to use for requests. | `export GUIDELLM__OPENAI__ORGANIZATION="org-12345"` |
+| `GUIDELLM__OPENAI__PROJECT` | The OpenAI project to use for requests. | `export GUIDELLM__OPENAI__PROJECT="proj-67890"` |
+| `GUIDELLM__OPENAI__VERIFY` | Set to `false` or `0` to disable certificate verification. | `export GUIDELLM__OPENAI__VERIFY=false` |
+| `GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS` | The default maximum number of tokens to request for completions. | `export GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS=2048` |
+
+### General HTTP Settings
+
+These settings control the behavior of the underlying HTTP client.
+
+| Environment Variable | Description |
+| --- | --- |
+| `GUIDELLM__REQUEST_TIMEOUT` | The timeout in seconds for HTTP requests. Defaults to 300. |
+| `GUIDELLM__REQUEST_HTTP2` | Set to `true` or `1` to enable HTTP/2 support. Defaults to true. |
+| `GUIDELLM__REQUEST_FOLLOW_REDIRECTS` | Set to `true` or `1` to allow the client to follow redirects. Defaults to true. |
+
+
+### Using a `.env` file
+
+You can also place these variables in a `.env` file in your project's root directory:
+
+```dotenv
+# .env file
+GUIDELLM__OPENAI__BASE_URL="http://localhost:8080"
+GUIDELLM__OPENAI__API_KEY="your-api-key"
+GUIDELLM__OPENAI__HEADERS='{"Authorization": "Bearer my-token"}'
+GUIDELLM__OPENAI__VERIFY=false
+```
diff --git a/docs/guides/data_formats.md b/docs/guides/data_formats.md
@@ -0,0 +1,62 @@
+# Data Formats
+
+The `--data` argument for the `guidellm benchmark run` command accepts several different formats for specifying the data to be used for benchmarking.
+
+## Local Data Files
+
+You can provide a path to a local data file in one of the following formats:
+
+- **CSV (.csv)**: A comma-separated values file. The loader will attempt to find a column with a common name for the prompt (e.g., `prompt`, `text`, `instruction`).
+- **JSON (.json)**: A JSON file. The structure should be a list of objects, where each object represents a row of data.
+- **JSON Lines (.jsonl)**: A file where each line is a valid JSON object.
+- **Text (.txt)**: A plain text file, where each line is treated as a separate prompt.
+
+If the prompt column cannot be automatically determined, you can specify it using the `--data-args` option:
+```bash
+--data-args '{"text_column": "my_custom_prompt_column"}'
+```
+
+## Synthetic Data
+
+You can generate synthetic data on the fly by providing a configuration string or file.
+
+### Configuration Options
+
+| Parameter | Description |
+| --- | --- |
+| `prompt_tokens` | **Required.** The average number of tokens for the generated prompts. |
+| `output_tokens` | **Required.** The average number of tokens for the generated outputs. |
+| `samples` | The total number of samples to generate. Defaults to 1000. |
+| `source` | The source text to use for generating the synthetic data. Defaults to a built-in copy of "Pride and Prejudice". |
+| `prompt_tokens_stdev` | The standard deviation of the tokens generated for prompts. |
+| `prompt_tokens_min` | The minimum number of text tokens generated for prompts. |
+| `prompt_tokens_max` | The maximum number of text tokens generated for prompts. |
+| `output_tokens_stdev` | The standard deviation of the tokens generated for outputs. |
+| `output_tokens_min` | The minimum number of text tokens generated for outputs. |
+| `output_tokens_max` | The maximum number of text tokens generated for outputs. |
+
+### Configuration Formats
+
+You can provide the synthetic data configuration in one of three ways:
+
+1.  **Key-Value String:**
+    ```bash
+    --data "prompt_tokens=256,output_tokens=128,samples=500"
+    ```
+
+2.  **JSON String:**
+    ```bash
+    --data '{"prompt_tokens": 256, "output_tokens": 128, "samples": 500}'
+    ```
+
+3.  **YAML or Config File:**
+    Create a file (e.g., `my_config.yaml`):
+    ```yaml
+    prompt_tokens: 256
+    output_tokens: 128
+    samples: 500
+    ```
+    And use it with the `--data` argument:
+    ```bash
+    --data my_config.yaml
+    ```
diff --git a/src/guidellm/backend/openai.py b/src/guidellm/backend/openai.py
@@ -94,6 +94,7 @@ def __init__(
         extra_query: Optional[dict] = None,
         extra_body: Optional[dict] = None,
         remove_from_body: Optional[list[str]] = None,
+        **kwargs,
-        **kwargs,
+        headers: Optional[dict] = None,
+        verify: Optional[bool] = None,
-        **kwargs,
+        headers: Optional[dict] = None,
+        verify: Optional[bool] = None,
     ):
         super().__init__(type_="openai_http")
         self._target = target or settings.openai.base_url
@@ -110,20 +111,36 @@ def __init__(
 
         self._model = model
 
+        # Start with default headers based on other params
+        default_headers: dict[str, str] = {}
         api_key = api_key or settings.openai.api_key
-        self.authorization = (
-            f"Bearer {api_key}" if api_key else settings.openai.bearer_token
-        )
+        bearer_token = settings.openai.bearer_token
+        if api_key:
+            default_headers["Authorization"] = f"Bearer {api_key}"
+        elif bearer_token:
+            default_headers["Authorization"] = bearer_token
 
         self.organization = organization or settings.openai.organization
+        if self.organization:
+            default_headers["OpenAI-Organization"] = self.organization
+
         self.project = project or settings.openai.project
+        if self.project:
+            default_headers["OpenAI-Project"] = self.project
+
+        # User-provided headers from kwargs or settings override defaults
+        user_headers = kwargs.pop("headers", settings.openai.headers or {})
+        default_headers.update(user_headers)
+        self.headers = default_headers
-        user_headers = kwargs.pop("headers", settings.openai.headers or {})
-        default_headers.update(user_headers)
-        self.headers = default_headers
+        default_headers.update(settings.openai.headers or {})
+        default_headers.update(headers)
+        self.headers = {k: v for k, v in default_headers.items() if v is not None}
-        user_headers = kwargs.pop("headers", settings.openai.headers or {})
-        default_headers.update(user_headers)
-        self.headers = default_headers
+        default_headers.update(settings.openai.headers or {})
+        default_headers.update(headers)
+        self.headers = {k: v for k, v in default_headers.items() if v is not None}
+
         self.timeout = timeout if timeout is not None else settings.request_timeout
         self.http2 = http2 if http2 is not None else settings.request_http2
         self.follow_redirects = (
             follow_redirects
             if follow_redirects is not None
             else settings.request_follow_redirects
         )
+        self.verify = kwargs.pop("verify", settings.openai.verify)
-        self.verify = kwargs.pop("verify", settings.openai.verify)
+        self.verify = verify or settings.openai.verify
-        self.verify = kwargs.pop("verify", settings.openai.verify)
+        self.verify = verify or settings.openai.verify
         self.max_output_tokens = (
             max_output_tokens
             if max_output_tokens is not None
@@ -160,9 +177,7 @@ def info(self) -> dict[str, Any]:
             "timeout": self.timeout,
             "http2": self.http2,
             "follow_redirects": self.follow_redirects,
-            "authorization": bool(self.authorization),
-            "organization": self.organization,
-            "project": self.project,
+            "headers": self.headers,
             "text_completions_path": TEXT_COMPLETIONS_PATH,
             "chat_completions_path": CHAT_COMPLETIONS_PATH,
         }
@@ -383,6 +398,7 @@ def _get_async_client(self) -> httpx.AsyncClient:
                 http2=self.http2,
                 timeout=self.timeout,
                 follow_redirects=self.follow_redirects,
+                verify=self.verify,
             )
             self._async_client = client
         else:
@@ -394,16 +410,7 @@ def _headers(self) -> dict[str, str]:
         headers = {
             "Content-Type": "application/json",
         }
-
-        if self.authorization:
-            headers["Authorization"] = self.authorization
-
-        if self.organization:
-            headers["OpenAI-Organization"] = self.organization
-
-        if self.project:
-            headers["OpenAI-Project"] = self.project
-
+        headers.update(self.headers)
         return headers
 
     def _params(self, endpoint_type: EndpointType) -> dict[str, str]:

diff --git a/src/guidellm/config.py b/src/guidellm/config.py
@@ -81,10 +81,12 @@ class OpenAISettings(BaseModel):
 
     api_key: Optional[str] = None
     bearer_token: Optional[str] = None
+    headers: Optional[dict[str, str]] = None
     organization: Optional[str] = None
     project: Optional[str] = None
     base_url: str = "http://localhost:8000"
     max_output_tokens: int = 16384
+    verify: bool = True
 
 
 class Settings(BaseSettings):

diff --git a/tests/unit/backend/test_openai_backend.py b/tests/unit/backend/test_openai_backend.py
@@ -11,7 +11,7 @@ def test_openai_http_backend_default_initialization():
     backend = OpenAIHTTPBackend()
     assert backend.target == settings.openai.base_url
     assert backend.model is None
-    assert backend.authorization == settings.openai.bearer_token
+    assert backend.headers.get("Authorization") == settings.openai.bearer_token
     assert backend.organization == settings.openai.organization
     assert backend.project == settings.openai.project
     assert backend.timeout == settings.request_timeout
@@ -37,7 +37,7 @@ def test_openai_http_backend_intialization():
     )
     assert backend.target == "http://test-target"
     assert backend.model == "test-model"
-    assert backend.authorization == "Bearer test-key"
+    assert backend.headers.get("Authorization") == "Bearer test-key"
     assert backend.organization == "test-org"
     assert backend.project == "test-proj"
     assert backend.timeout == 10

diff --git a/tests/unit/backend/test_openai_backend_custom_configs.py b/tests/unit/backend/test_openai_backend_custom_configs.py
@@ -0,0 +1,64 @@
+import pytest
+
+from guidellm.backend import OpenAIHTTPBackend
+from guidellm.config import settings
+
+
+@pytest.mark.smoke
+def test_openai_http_backend_default_initialization():
+    backend = OpenAIHTTPBackend()
+    assert backend.verify is True
+
+
+@pytest.mark.smoke
+def test_openai_http_backend_custom_ssl_verification():
+    backend = OpenAIHTTPBackend(verify=False)
+    assert backend.verify is False
+
+
+@pytest.mark.smoke
+def test_openai_http_backend_custom_headers_override():
+    # Set a default api_key, which would normally create an Authorization header
+    settings.openai.api_key = "default-api-key"
+
+    # Set custom headers that override the default Authorization and add a new header
+    openshift_token = "Bearer sha256~xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+    override_headers = {
+        "Authorization": openshift_token,
+        "Custom-Header": "Custom-Value",
+    }
+
+    # Initialize the backend
+    backend = OpenAIHTTPBackend(headers=override_headers)
+
+    # Check that the override headers are used
+    assert backend.headers["Authorization"] == openshift_token
+    assert backend.headers["Custom-Header"] == "Custom-Value"
+    assert len(backend.headers) == 2
+
+    # Reset the settings
+    settings.openai.api_key = None
+    settings.openai.headers = None
+
+
+@pytest.mark.smoke
+def test_openai_http_backend_kwarg_headers_override_settings():
+    # Set headers via settings (simulating environment variables)
+    settings.openai.headers = {"Authorization": "Bearer settings-token"}
+
+    # Set different headers via kwargs (simulating --backend-args)
+    override_headers = {
+        "Authorization": "Bearer kwargs-token",
+        "Custom-Header": "Custom-Value",
+    }
+
+    # Initialize the backend with kwargs
+    backend = OpenAIHTTPBackend(headers=override_headers)
+
+    # Check that the kwargs headers took precedence
+    assert backend.headers["Authorization"] == "Bearer kwargs-token"
+    assert backend.headers["Custom-Header"] == "Custom-Value"
+    assert len(backend.headers) == 2
+
+    # Reset the settings
+    settings.openai.headers = None
diff --git a/tests/unit/test_config.py b/tests/unit/test_config.py
@@ -142,9 +142,13 @@ def test_settings_with_env_variables(mocker):
             "GUIDELLM__DATASET__PREFERRED_DATA_COLUMNS": '["custom_column"]',
             "GUIDELLM__OPENAI__API_KEY": "env_api_key",
             "GUIDELLM__TABLE_BORDER_CHAR": "*",
+            "GUIDELLM__OPENAI__HEADERS": '{"Authorization": "Bearer env-token"}',
+            "GUIDELLM__OPENAI__VERIFY": "false",
         },
     )
     settings = Settings()
     assert settings.dataset.preferred_data_columns == ["custom_column"]
     assert settings.openai.api_key == "env_api_key"
     assert settings.table_border_char == "*"
+    assert settings.openai.headers == {"Authorization": "Bearer env-token"}
+    assert settings.openai.verify is False
diff --git a/tests/unit/test_main.py b/tests/unit/test_main.py
@@ -0,0 +1,32 @@
+import pytest
+from click.testing import CliRunner
+
+from guidellm.__main__ import cli
+
+
+@pytest.mark.smoke
+def test_benchmark_run_with_backend_args():
+    runner = CliRunner()
+    result = runner.invoke(
+        cli,
+        [
+            "benchmark",
+            "run",
+            "--backend-args",
+            '{"headers": {"Authorization": "Bearer my-token"}, "verify": false}',
+            "--target",
+            "http://localhost:8000",
+            "--data",
+            "prompt_tokens=1,output_tokens=1",
+            "--rate-type",
+            "constant",
+            "--rate",
+            "1",
+            "--max-requests",
+            "1",
+        ],
+    )
+    # This will fail because it can't connect to the server,
+    # but it will pass the header parsing, which is what we want to test.
+    assert result.exit_code != 0
+    assert "Invalid header format" not in result.output