Token Utilization and Optimization #1305
Replies: 2 comments
-
|
I am not aware of any such documentation. I am thinking we'll need to build it. Here is the proposal: #2343 |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for raising this — token usage is something we're actively working on, and it's worth being candid about where things stand. Where we are todayThe honest answer is that the current version is not yet fully token-optimized. There are two distinct cost centers, and we're tackling them differently:
Direction: the transition to SkillsThe most impactful change underway is the shift from The key difference is when content loads:
So moving domain content out of always-on instruction files and into on-demand skills directly shrinks the baseline conversation-start cost. This is the single biggest lever we have on input tokens. Honest assessment of the remaining workWe're partway through this transition, not finished. Today the repo has on the order of ~75 instruction files and ~50 skills, and the two overlap. Fully realizing the savings requires:
None of this is a single flip-the-switch fix — it's incremental, and each migration needs validation so behavior doesn't regress. But the trajectory is deliberate: less eager context, more lazy loading, and leaner artifacts. On estimating cost up frontFor the specific "how many tokens will this burn before I run it?" question, @eugeneboms captured the shape of the problem well in #2343 — because many flows are interactive and open-ended, precise prediction is effectively unbounded, so any estimate will be a heuristic (target a rough order-of-magnitude / ±2x warning rather than an exact figure). That effort is complementary to the optimization work above: one predicts cost, the other reduces it. Happy to go deeper on any of these threads if it's useful. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there any guidelines/document which can help estimate how many token might get consumed per Agent.
Also is the current version optimized with respect to token usage or is there a plan to release a token Optimized version.
Beta Was this translation helpful? Give feedback.
All reactions