feat: inline language inference in outline generation#390
feat: inline language inference in outline generation#390
Conversation
…guageNote types Remove the `language` field from UserRequirements and all zh-CN/en-US hardcoded conditionals in prompt construction. Language will be inferred by the LLM during outline generation (handled in subsequent tasks). - Remove language toggle from GenerationToolbar and homepage form - Remove normalizeLanguage helper and language-based prompt branching - Standardize formatImageDescription/formatImagePlaceholder to English only - Add `languageDirective` to Stage, `languageNote` to SceneOutline - Fix generation-preview references to the removed requirements.language Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change LLM output format from flat JSON array to wrapper object
{ languageDirective, outlines }. Add language inference instructions
to system prompt with signal priority and examples. Replace hardcoded
Course Language section in user prompt with Language Context for
inference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update generateSceneOutlinesFromRequirements to return a wrapper
object { languageDirective, outlines } instead of a flat SceneOutline[].
Parse the new LLM response format with backward compatibility for old
flat-array responses. Add pdfLanguageSample template variable for
language inference in the prompt.
Note: downstream callers (classroom-generation.ts, pipeline-runner.ts)
have expected type errors that will be fixed in subsequent tasks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the scene-outlines-stream API to support the new wrapper
response format { languageDirective, outlines: [...] }:
- Add pdfLanguageSample template variable to the prompt
- Add extractLanguageDirective() to parse directive from partial JSON
- Update extractNewOutlines() to handle nested "outlines" array key
- Emit languageDirective SSE event as soon as it's parsed
- Include languageDirective in the done event payload
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ents Reorders the server-side generation pipeline so outline generation (which now infers languageDirective) happens before agent profile generation. This lets agent names/personas follow the inferred language. Pipeline order: web search → outlines → agents → scenes Also threads languageDirective through to generateSceneContent and generateSceneActions (those functions don't accept the param yet — that's Tasks 8/9). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and agent profiles API
- Agent profiles API accepts languageDirective instead of language
- Scene content generation accepts and passes languageDirective to prompts
- Scene actions generation accepts and passes languageDirective to prompts
- All prompt templates updated with {{languageDirective}} variable
- Fix pipeline-runner.ts for new outline return type
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorder the generation-preview page so outlines are generated before agent profiles, enabling the languageDirective inferred from outlines to flow into agent generation, scene content, and scene actions. - Swap outline and agent-generation step order in ALL_STEPS - Add languageDirective to GenerationSessionState - Capture languageDirective from outline SSE stream events - Pass languageDirective + outlines to agent-profiles API - Pass languageDirective to scene-content and scene-actions APIs - Store languageDirective in sessionStorage for classroom page - Update use-scene-generator GenerationParams with languageDirective - Update classroom page to pass languageDirective to generateRemaining Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 20 curated test cases from production covering: - Pure Chinese/English requirements - Chinese with English tech terms - Foreign language learning (EN→CN, EN→DE, ZH→EN, AR→EN) - Cross-language locale mismatch - Non-Chinese/English languages (Spanish, German, Arabic) - Short/ambiguous requirements - Uses actual outline system prompt for inference - LLM-as-judge evaluates against human-verified ground truth - Results written to outline-language.eval.result.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows running eval tests with different models for inference vs judging: EVAL_INFERENCE_MODEL=google/gemini-3-flash-preview EVAL_JUDGE_MODEL=openai/gpt-4o-mini Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…guage inference - Add special case rules for advanced learners (TEM-8, DALF C1, JLPT N1, etc.) who should be taught in the target language, not their native language - Add example for advanced English learner case - Remove ambiguous LLC test case (no context to disambiguate) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call generateSceneOutlinesFromRequirements directly instead of a shortened prompt, so the test exercises the exact same code path as production. Each case now generates full outlines + languageDirective. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… prompt examples - Add 3 PDF test cases from production (EN paper + ZH requirement, ESL teacher + EN article, ZH C++ syllabus) - Run tests concurrently with maxConcurrency: 10 (3.5x faster) - Balance system prompt examples: 3 Chinese + 3 English + 1 Spanish (was 3 Chinese + 2 English, causing Chinese bias) - Add "I want to learn German A1" example to clarify English-user foreign language learning should use English instruction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the 7-row examples table to: - Reduce token consumption (~500 tokens saved per call) - Eliminate language distribution bias in examples - Eval results: 21/21 (100%) without examples, same or better than with Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*.eval.test.ts files require real LLM API keys and should only be run locally via explicit file path, not in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
689b862 to
e6d637e
Compare
The scene-content route was defaulting outline.language to 'zh-CN', which contradicted the new languageDirective for English courses. Remove the legacy language parameter from generateInteractiveContent and use languageDirective as the single source of truth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pass userProfile template variable in the SSE streaming route so the LLM has the student profile signal for language inference (matching the non-streaming outline-generator.ts behavior) - Fix extractLanguageDirective to handle \n and \t escape sequences Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e-runner - Client SSE handler uses same fallback message as server when languageDirective is missing from stream - pipeline-runner.ts extracts and passes languageDirective to generateFullScenes → generateSingleScene → content/actions generators Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace long positional parameter lists with named options objects (SceneContentOptions, SceneActionsOptions) to eliminate cascading undefined arguments at call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expand test cases from 22 to 50 covering language learning, immersive, explicit instruction, code-switching, minimal input, user profiles (teacher/parent/tutor/heritage/professional), and cross-language PDF - Rewrite language inference prompt with clear decision rules for foreign language learning, cross-language PDF, proxy requests, and terminology handling - Remove redundant pdfLanguageSample (duplicated first 200 chars of pdfContent already in Reference Materials section) - Add vitest.eval.config.ts for running eval tests separately - 50/50 pass rate with gemini-3-flash-preview + gpt-4o judge Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eval test cases expanded: 22 → 50New test dimensions added (28 cases): Language learning (5) — Japanese/Korean learners, non-zh/en axis (ja→zh), multi-target language learning Immersive learning (2) — Advanced learners requesting full target-language immersion (Japanese→English, Chinese→French) Explicit language instruction (2) — User explicitly overrides default language ("请用英文教我", "explain in Chinese please") Code-switching & bilingual (2) — Mixed zh/en input, explicit bilingual teaching request Minimal / ambiguous input (2) — Single-word requirement ("微积分"), pinyin romanized input User profiles (8) — Teacher designing foreign language lesson, parent proxy for IB student, bilingual heritage speaker, professional business English, immigrant integration, tutor with bilingual student, teacher of Chinese-as-foreign-language Cross-language PDF (7) — English req + Chinese PDF, Chinese req + Japanese/French PDF, Japanese req + English PDF, bilingual PDF, teacher using foreign-language material Also simplified the language inference prompt (system.md ~80→30 lines, removed redundant |
- Remove duplicate early agent resolution block (keep post-outline version that uses languageDirective instead of lang) - Adapt eval test to async resolveModel signature from main Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Embed language inference directly into outline generation (replacing the standalone LLM call approach), move agent profile generation after outlines, and remove the manual language selector along with all hardcoded
zh-CN/en-USconditionals.Related Issues
Supersedes feat/language-inference branch (#381)
Changes
SceneOutline[]to{ languageDirective, outlines }wrapper objectlanguageDirectivepropagates through outline → agent profiles → scene content → scene actions → chatUserRequirements.languagefield, toolbar language toggle,normalizeLanguage()functionzh-CN/en-UShardcodes: prompt construction code uses English only; LLM infers teaching language from user requirement textgemini-3-flash-previewinference +gpt-4o-minijudge, 21/21 passType of Change
Verification
Steps to reproduce / test
EVAL_INFERENCE_MODEL=google:gemini-3-flash-preview EVAL_JUDGE_MODEL=openai:gpt-4o-mini pnpm vitest run tests/generation/outline-language.eval.test.tsWhat you personally verified
Evidence
Eval result: 21/21 (100%) with
gemini-3-flash-preview+gpt-4o-minijudgeTypeScript:
npx tsc --noEmitzero errorsCI passes (
pnpm check && pnpm lint && npx tsc --noEmit)Manually tested locally
Screenshots / recordings attached (if UI changes)
Checklist