Skip to content

Conversation

@hhk7734
Copy link
Contributor

@hhk7734 hhk7734 commented Nov 29, 2025

When benchmarking high-throughput inference servers with complex architectures involving multiple network hops (e.g., llm-d), we observed that ITL often converge to near-zero values at lower percentiles.

To analyze these scenarios effectively, we need the ability to define granular percentile breakpoints rather than relying on a fixed set of standard percentiles.

The functionality was verified by building the image and running benchmarks with custom percentile configurations.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hhk7734
Once this PR has been reviewed and has the lgtm label, please assign achandrasekar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants