Reduce LLM max tokens from 128K to 8K for reasoning

Change DEFAULT_LLM_MAX_TOKENS from 128000 to 8192 for reasoning (JSON response is always under 500 tokens). Add a separate TRANSCRIPT_MAX_TOKENS=128000 for transcript analysis if needed later.

Files to modify:
- src/amplihack/fleet/_constants.py
- src/amplihack/fleet/_backends.py

Requirements:
1. Change DEFAULT_LLM_MAX_TOKENS = 8192 in _constants.py
2. Add TRANSCRIPT_MAX_TOKENS = 128000 in _constants.py
3. Update _backends.py to use DEFAULT_LLM_MAX_TOKENS for reasoning calls
4. Update _backends.py to use TRANSCRIPT_MAX_TOKENS for transcript analysis calls
5. Add docstrings explaining the distinction
6. All 918+ fleet tests must pass
7. Commit changes on this branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce LLM max tokens from 128K to 8K for reasoning #2902

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reduce LLM max tokens from 128K to 8K for reasoning #2902

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions