Per-worker concurrency is capped at 100

**Describe the bug**
Setting `GUIDELLM__MAX_WORKER_PROCESSES=1` results in a maximum of exactly 100 concurrency measured from server-side. Client-side reports full concurrency meaning that the bottleneck is likely after we launch the request thread.

**Expected behavior**
Actual max concurrency per-worker should be much higher.

**Environment**
Include all relevant environment information:
1. OS [e.g. Ubuntu 20.04]: Fedora Linux 42 (Container Image) on OCP 4.19
2. Python version [e.g. 3.12.2]: 3.13.7
3. GuideLLM version: v0.4.0

**To Reproduce**
Exact steps to reproduce the behavior:

```sh
export GUIDELLM__MAX_WORKER_PROCESSES=1
guidellm benchmark \
                --target http://localhost:8000
                --rate-type concurrent \
                --rate "128" \
                --max-seconds 120 \
                --data "prompt_tokens=256,output_tokens=128"
```

Observe from server-side that there are never more than 100 waiting + running requests.

**Additional context**
Behavior is constant with more workers. E.g. `GUIDELLM__MAX_WORKER_PROCESSES=2` results in a maximum concurrency of 200. Our default is 10 workers; 1,000+ concurrency is pretty rare for single-node tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-worker concurrency is capped at 100 #487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Per-worker concurrency is capped at 100 #487

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions