Token Utilization and Optimization #1305

Trinanjan90 · 2026-04-07T12:57:14Z

Trinanjan90
Apr 7, 2026

Is there any guidelines/document which can help estimate how many token might get consumed per Agent.
Also is the current version optimized with respect to token usage or is there a plan to release a token Optimized version.

eugeneboms · 2026-07-01T18:09:25Z

eugeneboms
Jul 1, 2026
Collaborator

I am not aware of any such documentation. I am thinking we'll need to build it. Here is the proposal: #2343

0 replies

WilliamBerryiii · 2026-07-03T20:43:43Z

WilliamBerryiii
Jul 3, 2026
Maintainer

Thanks for raising this — token usage is something we're actively working on, and it's worth being candid about where things stand.

Where we are today

The honest answer is that the current version is not yet fully token-optimized. There are two distinct cost centers, and we're tackling them differently:

Conversation-start (input) tokens — context that loads before you even type a request.
In-workflow (output) tokens — what agents generate as they run, especially artifact-heavy planning flows.

Direction: the transition to Skills

The most impactful change underway is the shift from .instructions.md files toward Skills (SKILL.md).

The key difference is when content loads:

Instruction files with an applyTo glob are eagerly pulled into context at conversation start whenever the pattern matches. The more instruction surface a workspace accumulates, the heavier every conversation gets — whether or not that guidance is relevant to what you're doing.
Skills are lazily loaded. Only a short description sits in context; the full SKILL.md body is read on demand, and only when the task actually calls for that domain knowledge.

So moving domain content out of always-on instruction files and into on-demand skills directly shrinks the baseline conversation-start cost. This is the single biggest lever we have on input tokens.

Honest assessment of the remaining work

We're partway through this transition, not finished. Today the repo has on the order of ~75 instruction files and ~50 skills, and the two overlap. Fully realizing the savings requires:

Migrating remaining instruction content into skills where it's domain-specific rather than universally applicable, and tightening applyTo scopes on what stays so files load only when genuinely relevant.
Auditing shared/overlay instruction files that currently attach broadly, to confirm each one earns its place in every matching conversation.
Reducing redundancy in artifact generation. The RPI and planner workflows produce a lot of durable tracking artifacts (research, plans, details, changes, review logs). Some of that content is regenerated or restated across artifacts, which inflates output tokens. Consolidating overlapping sections, referencing rather than re-quoting prior artifacts, and skipping artifact scaffolding for simpler tasks are all on the table.

None of this is a single flip-the-switch fix — it's incremental, and each migration needs validation so behavior doesn't regress. But the trajectory is deliberate: less eager context, more lazy loading, and leaner artifacts.

On estimating cost up front

For the specific "how many tokens will this burn before I run it?" question, @eugeneboms captured the shape of the problem well in #2343 — because many flows are interactive and open-ended, precise prediction is effectively unbounded, so any estimate will be a heuristic (target a rough order-of-magnitude / ±2x warning rather than an exact figure). That effort is complementary to the optimization work above: one predicts cost, the other reduces it.

Happy to go deeper on any of these threads if it's useful.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token Utilization and Optimization #1305

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Token Utilization and Optimization #1305

Uh oh!

Trinanjan90 Apr 7, 2026

Replies: 2 comments

Uh oh!

eugeneboms Jul 1, 2026 Collaborator

Uh oh!

WilliamBerryiii Jul 3, 2026 Maintainer

Where we are today

Direction: the transition to Skills

Honest assessment of the remaining work

On estimating cost up front

Trinanjan90
Apr 7, 2026

eugeneboms
Jul 1, 2026
Collaborator

WilliamBerryiii
Jul 3, 2026
Maintainer