Bug: Token rate limiter uses inaccurate heuristic tokenizer for input token accounting

/kind enhancement
## What happened

The token rate limiter currently uses:

```go id="h0z8f5"
tokenizer: tokenizer.NewSimpleEstimateTokenizer()
```

which estimates tokens using a `/4` character heuristic instead of model-accurate tokenization.

This causes inaccurate input token rate limiting:

* code-heavy prompts are significantly over-counted
* some model families may be under-counted
* users can hit limits earlier or later than expected

The issue only affects input token accounting. Output token accounting is already accurate because it uses actual `completion_tokens` returned by the model response.

A TikToken-based tokenizer already exists in the codebase, but using it directly as default also introduces problems:

* TikToken is GPT/OpenAI specific
* token estimation becomes inaccurate for non-GPT models (Claude, LLaMA, Mistral, etc.)
* remote tokenizer calls can increase latency under concurrency

Related discussion:
[[#1100 comment](https://github.com/volcano-sh/kthena/issues/1100)](https://github.com/volcano-sh/kthena/issues/1100)

That discussion also highlights another issue:

* tokenizer requests currently randomly select vLLM pods
* tokenization latency becomes dependent on inference pod load
* under high concurrency this can create unpredictable admission latency

So the current local estimator solves latency and model-compatibility concerns, but its estimation error can still become large enough to affect practical quota enforcement.

---


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Token rate limiter uses inaccurate heuristic tokenizer for input token accounting #1161

What happened

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Token rate limiter uses inaccurate heuristic tokenizer for input token accounting #1161

Description

What happened

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions