Skip to content

Commit dab80ce

Browse files
authored
[Testing / CI/CD] Ability to automate scale testing with a mock server and test different datasets, loadgen, etc. and run it as a part of CI/CD (kubernetes-sigs#274) (kubernetes-sigs#274)
## Summary This introduces an e2e testing using a mock server to enable automated scale testing as part of our CI/CD pipeline. Key changes include: * Mock Client: Mock client is used to simulate the behavior of a real API client. It has a configurable latency to simulate different response times. * E2e Test case: A test case - `basic_mock_client_benchmark` is added. This test runs a benchmark with the mock client, generates reports, and asserts key metrics like achieved request rate and success count. * E2e Test Utility: A `run_benchmark_minimal` function is created to simplify the process of running benchmarks in tests. It handles configuration, execution, and result parsing. * CI/CD Integration: updated with a new `pdm run test:e2e` command to easily run the end-to-end tests in our CI/CD workflow or Dev Env. Note: there are few more commits to add test to CI workflow
1 parent 16c972f commit dab80ce

File tree

14 files changed

+281
-10
lines changed

14 files changed

+281
-10
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: E2E Test on change
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- 'feature/**'
8+
pull_request:
9+
branches:
10+
- main
11+
- 'feature/**'
12+
13+
jobs:
14+
e2e-tests:
15+
runs-on: ubuntu-latest
16+
strategy:
17+
matrix:
18+
python-version: ['3.13']
19+
steps:
20+
- name: Checkout Code
21+
uses: actions/checkout@v4
22+
- name: Set up Python
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: ${{ matrix.python-version }}
26+
- name: Set up PDM
27+
uses: pdm-project/setup-pdm@v4
28+
with:
29+
python-version: ${{ matrix.python-version }}
30+
- name: Install dependencies
31+
run: |
32+
pdm sync -d
33+
- name: Run e2e tests
34+
run: |
35+
pdm run test:e2e

.github/workflows/format.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,27 @@ on:
66
- main
77
- 'feature/**'
88
pull_request:
9+
branches:
10+
- main
11+
- 'feature/**'
912

1013
jobs:
1114
format-check:
1215
runs-on: ubuntu-latest
16+
strategy:
17+
matrix:
18+
python-version: ['3.13']
1319
steps:
1420
- name: Checkout Code
1521
uses: actions/checkout@v4
1622
- name: Set up Python
1723
uses: actions/setup-python@v5
1824
with:
19-
python-version: '3.13'
25+
python-version: ${{ matrix.python-version }}
2026
- name: Set up PDM
2127
uses: pdm-project/setup-pdm@v4
28+
with:
29+
python-version: ${{ matrix.python-version }}
2230
- name: Install dependencies
2331
run: |
2432
pdm sync -d

.github/workflows/publish-on-release.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,16 @@ jobs:
6060
python-package:
6161
needs: build-and-publish # Run after the release is created
6262
runs-on: ubuntu-latest
63+
strategy:
64+
matrix:
65+
python-version: ['3.13']
6366
steps:
6467
- uses: actions/checkout@v4
6568

6669
- name: Set up Python
6770
uses: actions/setup-python@v4
6871
with:
69-
python-version: '3.12'
72+
python-version: ${{ matrix.python-version }}
7073

7174
- name: Install build dependencies
7275
run: |

.github/workflows/test-release.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,16 @@ on:
66
jobs:
77
test-python-package:
88
runs-on: ubuntu-latest
9+
strategy:
10+
matrix:
11+
python-version: ['3.13']
912
steps:
1013
- uses: actions/checkout@v4
1114

1215
- name: Set up Python
1316
uses: actions/setup-python@v4
1417
with:
15-
python-version: '3.12'
18+
python-version: ${{ matrix.python-version }}
1619

1720
- name: Install build dependencies
1821
run: |

.github/workflows/unit_test.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,20 @@ on:
1010
jobs:
1111
format-check:
1212
runs-on: ubuntu-latest
13+
strategy:
14+
matrix:
15+
python-version: ['3.13']
1316
steps:
1417
- name: Checkout Code
1518
uses: actions/checkout@v4
1619
- name: Set up Python
1720
uses: actions/setup-python@v5
1821
with:
19-
python-version: '3.13'
22+
python-version: ${{ matrix.python-version }}
2023
- name: Set up PDM
2124
uses: pdm-project/setup-pdm@v4
25+
with:
26+
python-version: ${{ matrix.python-version }}
2227
- name: Install dependencies
2328
run: |
2429
pdm sync -d
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
data:
2+
type: mock
3+
load:
4+
type: constant
5+
stages:
6+
- rate: 1
7+
duration: 10
8+
num_workers: 2
9+
api:
10+
type: chat
11+
server:
12+
type: mock
13+
base_url: http://0.0.0.0:8000
14+
report:
15+
request_lifecycle:
16+
summary: true
17+
per_stage: true
18+
per_request: true

e2e/conftest.py

Whitespace-only changes.

e2e/tests/test_mock_client.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import pytest
2+
3+
from utils.benchmark import run_benchmark_minimal
4+
5+
6+
def test_simple_mock_client_benchmark():
7+
result = run_benchmark_minimal("e2e/configs/e2e_simple_mock_client.yaml", timeout_sec=None)
8+
assert result.success, "Benchmark failed"
9+
assert result.reports, "No reports generated from benchmark"
10+
assert result.reports["per_request_lifecycle_metrics.json"], "Missing requests report"
11+
assert result.reports["stage_0_lifecycle_metrics.json"], "Missing stage report"
12+
assert result.reports["summary_lifecycle_metrics.json"], "Missing summary report"
13+
14+
requests_report = result.reports["per_request_lifecycle_metrics.json"]
15+
stage_report = result.reports["stage_0_lifecycle_metrics.json"]
16+
summary_report = result.reports["summary_lifecycle_metrics.json"]
17+
18+
assert len(requests_report) == 10, "the number of requests should be 10"
19+
assert stage_report["load_summary"]["achieved_rate"] > 1 or stage_report["load_summary"]["achieved_rate"] == pytest.approx(
20+
1, abs=0.2
21+
), "the achieved rate should be close to 1.0"
22+
assert summary_report["successes"]["count"] == 10

e2e/utils/benchmark.py

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
import json
2+
import os
3+
import shlex
4+
import subprocess
5+
import tempfile
6+
import yaml
7+
import logging
8+
from dataclasses import dataclass
9+
from pathlib import Path
10+
from typing import Any, Dict, Optional, List, Union
11+
12+
logger = logging.getLogger(__name__)
13+
14+
15+
@dataclass
16+
class BenchmarkResult:
17+
"""Result of a minimal benchmark run."""
18+
19+
success: bool # True if process exit code == 0 and not timed out
20+
timed_out: bool # True if we hit timeout and killed the process
21+
returncode: int # Raw process return code (or -9/-15 on kill)
22+
stdout: str # Combined stdout/stderr text
23+
work_dir: Path # Working directory used for the run
24+
reports: Optional[Dict[str, Any]] # Parsed json for reports if present
25+
26+
27+
def _process_yaml_config(config: Union[str, Path, Dict[str, Any]], out_dir: Path) -> Path:
28+
out_dir.mkdir(parents=True, exist_ok=True)
29+
cfg_path = out_dir / "config_input.yaml"
30+
31+
if isinstance(config, (str, Path)):
32+
src = Path(config)
33+
if not src.exists():
34+
raise FileNotFoundError(f"Config file not found: {src}")
35+
config = yaml.safe_load(src.read_text(encoding="utf-8"))
36+
37+
# Overwrite output path to temporaty folder
38+
config["storage"] = {"local_storage": {"path": out_dir.as_posix()}}
39+
40+
cfg_path.write_text(
41+
yaml.safe_dump(config, sort_keys=False, default_flow_style=False),
42+
encoding="utf-8",
43+
)
44+
return cfg_path
45+
46+
47+
def _find_report_files(path: Path) -> Optional[List[Path]]:
48+
"""Return the json reports files under path (if any)."""
49+
candidates = list(path.glob("**/*.json"))
50+
if not candidates:
51+
return None
52+
return candidates
53+
54+
55+
def run_benchmark_minimal(
56+
config: Union[str, Path, Dict[str, Any]],
57+
*,
58+
work_dir: Optional[Union[str, Path]] = None,
59+
executable: str = "inference-perf",
60+
timeout_sec: Optional[int] = 300,
61+
extra_env: Optional[Dict[str, str]] = None,
62+
) -> BenchmarkResult:
63+
"""
64+
Minimal wrapper:
65+
- materializes config to YAML in work_dir,
66+
- runs `inference-perf --config_file <config.yml>`,
67+
- returns success/failure, stdout text, and parsed report.json (if present).
68+
On timeout:
69+
- kills the spawned process,
70+
- marks `timed_out=True`, returns collected stdout up to kill.
71+
"""
72+
wd = Path(work_dir) if work_dir else Path(tempfile.mkdtemp(prefix="inference-perf-e2e-"))
73+
cfg_path = _process_yaml_config(config, wd)
74+
75+
env = os.environ.copy()
76+
if extra_env:
77+
env.update({k: str(v) for k, v in extra_env.items()})
78+
79+
cmd = f"{shlex.quote(executable)} --config_file {shlex.quote(str(cfg_path))} --log-level DEBUG"
80+
81+
timed_out = False
82+
try:
83+
proc = subprocess.run(
84+
cmd,
85+
cwd=str(wd),
86+
env=env,
87+
shell=True,
88+
stdout=subprocess.PIPE,
89+
stderr=subprocess.STDOUT,
90+
text=True,
91+
timeout=timeout_sec,
92+
)
93+
stdout = proc.stdout
94+
return_code = proc.returncode
95+
except subprocess.TimeoutExpired as e:
96+
timed_out = True
97+
stdout = e.stdout
98+
return_code = -9
99+
100+
success = (return_code == 0) and (not timed_out)
101+
102+
logger.info("Benchmark output:\n%s", stdout)
103+
104+
# Attempt to read report.json (optional)
105+
report_path = _find_report_files(wd)
106+
reports = {report.name: json.loads(report.read_text(encoding="utf-8")) for report in report_path} if report_path else None
107+
108+
return BenchmarkResult(
109+
success=success,
110+
timed_out=timed_out,
111+
returncode=return_code,
112+
stdout=stdout or "",
113+
work_dir=wd,
114+
reports=reports,
115+
)

inference_perf/client/modelserver/mock_client.py

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from typing import List, Optional
1717
from inference_perf.config import APIConfig, APIType
1818
from inference_perf.apis import InferenceAPIData, InferenceInfo, RequestLifecycleMetric, ErrorResponseInfo
19-
from .base import ModelServerClient
19+
from .base import ModelServerClient, ModelServerPrometheusMetric, PrometheusMetricMetadata
2020
import asyncio
2121
import time
2222
import logging
@@ -29,12 +29,13 @@ def __init__(
2929
self,
3030
metrics_collector: RequestDataCollector,
3131
api_config: APIConfig,
32-
timeout: Optional[int] = None,
33-
mock_latency: float = 3,
32+
timeout: Optional[float] = None,
33+
mock_latency: float = 1,
3434
) -> None:
3535
super().__init__(api_config, timeout)
3636
self.metrics_collector = metrics_collector
3737
self.mock_latency = mock_latency
38+
self.tokenizer = None
3839

3940
async def process_request(self, data: InferenceAPIData, stage_id: int, scheduled_time: float) -> None:
4041
start = time.perf_counter()
@@ -44,7 +45,8 @@ async def process_request(self, data: InferenceAPIData, stage_id: int, scheduled
4445
await asyncio.sleep(self.timeout)
4546
raise asyncio.exceptions.TimeoutError()
4647
else:
47-
await asyncio.sleep(self.mock_latency)
48+
if self.mock_latency > 0:
49+
await asyncio.sleep(self.mock_latency)
4850
self.metrics_collector.record_metric(
4951
RequestLifecycleMetric(
5052
stage_id=stage_id,
@@ -81,3 +83,48 @@ async def process_request(self, data: InferenceAPIData, stage_id: int, scheduled
8183

8284
def get_supported_apis(self) -> List[APIType]:
8385
return [APIType.Completion, APIType.Chat]
86+
87+
def get_prometheus_metric_metadata(self) -> PrometheusMetricMetadata:
88+
mock_prometheus_metric = ModelServerPrometheusMetric(
89+
name="mock_metric",
90+
op="mean",
91+
type="counter",
92+
filters=[],
93+
)
94+
return PrometheusMetricMetadata(
95+
# Throughput
96+
prompt_tokens_per_second=mock_prometheus_metric,
97+
output_tokens_per_second=mock_prometheus_metric,
98+
requests_per_second=mock_prometheus_metric,
99+
# Latency
100+
avg_request_latency=mock_prometheus_metric,
101+
median_request_latency=mock_prometheus_metric,
102+
p90_request_latency=mock_prometheus_metric,
103+
p99_request_latency=mock_prometheus_metric,
104+
# Request
105+
total_requests=mock_prometheus_metric,
106+
avg_prompt_tokens=mock_prometheus_metric,
107+
avg_output_tokens=mock_prometheus_metric,
108+
avg_queue_length=mock_prometheus_metric,
109+
# Others
110+
avg_time_to_first_token=None,
111+
median_time_to_first_token=None,
112+
p90_time_to_first_token=None,
113+
p99_time_to_first_token=None,
114+
avg_time_per_output_token=None,
115+
median_time_per_output_token=None,
116+
p90_time_per_output_token=None,
117+
p99_time_per_output_token=None,
118+
avg_inter_token_latency=None,
119+
median_inter_token_latency=None,
120+
p90_inter_token_latency=None,
121+
p99_inter_token_latency=None,
122+
avg_kv_cache_usage=None,
123+
median_kv_cache_usage=None,
124+
p90_kv_cache_usage=None,
125+
p99_kv_cache_usage=None,
126+
num_preemptions_total=None,
127+
num_requests_swapped=None,
128+
prefix_cache_hits=None,
129+
prefix_cache_queries=None,
130+
)

0 commit comments

Comments
 (0)