Skip to content

Reduce LLM max tokens from 128K to 8K for reasoning #2902

@rysweet

Description

@rysweet

Change DEFAULT_LLM_MAX_TOKENS from 128000 to 8192 for reasoning (JSON response is always under 500 tokens). Add a separate TRANSCRIPT_MAX_TOKENS=128000 for transcript analysis if needed later.

Files to modify:

  • src/amplihack/fleet/_constants.py
  • src/amplihack/fleet/_backends.py

Requirements:

  1. Change DEFAULT_LLM_MAX_TOKENS = 8192 in _constants.py
  2. Add TRANSCRIPT_MAX_TOKENS = 128000 in _constants.py
  3. Update _backends.py to use DEFAULT_LLM_MAX_TOKENS for reasoning calls
  4. Update _backends.py to use TRANSCRIPT_MAX_TOKENS for transcript analysis calls
  5. Add docstrings explaining the distinction
  6. All 918+ fleet tests must pass
  7. Commit changes on this branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions