The Nexus stage's fallback token pruning uses an English stopword list. For multilingual content (common in international agent deployments), this means:
- Non-English text gets almost zero compression from Nexus
- Mixed-language content has inconsistent compression
Could the stopword list be extended to cover top 10 languages, or use a language-detection heuristic from Cortex to load the right list?
The Nexus stage's fallback token pruning uses an English stopword list. For multilingual content (common in international agent deployments), this means:
Could the stopword list be extended to cover top 10 languages, or use a language-detection heuristic from Cortex to load the right list?