Skip to content

feat: inline language inference in outline generation#390

Closed
cosarah wants to merge 25 commits intomainfrom
feat/language-inline
Closed

feat: inline language inference in outline generation#390
cosarah wants to merge 25 commits intomainfrom
feat/language-inline

Conversation

@cosarah
Copy link
Copy Markdown
Collaborator

@cosarah cosarah commented Apr 9, 2026

Summary

Embed language inference directly into outline generation (replacing the standalone LLM call approach), move agent profile generation after outlines, and remove the manual language selector along with all hardcoded zh-CN/en-US conditionals.

Related Issues

Supersedes feat/language-inference branch (#381)

Changes

  • Pipeline reorder: outline generation now runs before agent profile generation (both server-side and client-side flows)
  • Inline language inference: outline LLM output changes from SceneOutline[] to { languageDirective, outlines } wrapper object
  • Full-pipeline language directive: languageDirective propagates through outline → agent profiles → scene content → scene actions → chat
  • Remove manual language selection: UserRequirements.language field, toolbar language toggle, normalizeLanguage() function
  • Remove all zh-CN/en-US hardcodes: prompt construction code uses English only; LLM infers teaching language from user requirement text
  • Advanced learner recognition: system prompt identifies advanced foreign language learners (TEM-8, DALF C1, etc.) who should be taught in the target language
  • Eval tests: 22 production-sourced test cases (including 3 cross-language PDF cases), gemini-3-flash-preview inference + gpt-4o-mini judge, 21/21 pass

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring (no functional changes)

Verification

Steps to reproduce / test

  1. Create a course with a Chinese requirement (e.g. "从零学 Python"), verify content is generated in Chinese
  2. Create a course with an English requirement (e.g. "Explain photosynthesis"), verify content is in English
  3. Verify generation flow order: PDF analysis → Web search → OutlinesAgent generation → Content
  4. Verify the language toggle button is no longer in the toolbar
  5. Run eval tests: EVAL_INFERENCE_MODEL=google:gemini-3-flash-preview EVAL_JUDGE_MODEL=openai:gpt-4o-mini pnpm vitest run tests/generation/outline-language.eval.test.ts

What you personally verified

  • Tested Chinese and English requirements in the frontend — language inference correct
  • Verified advanced English learner case (TEM-8 oral fluency improvement)
  • Eval tests: 22 cases run concurrently, 21/21 that entered judge all PASS

Evidence

  • Eval result: 21/21 (100%) with gemini-3-flash-preview + gpt-4o-mini judge

  • TypeScript: npx tsc --noEmit zero errors

  • CI passes (pnpm check && pnpm lint && npx tsc --noEmit)

  • Manually tested locally

  • Screenshots / recordings attached (if UI changes)

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have added/updated documentation as needed
  • My changes do not introduce new warnings

cosarah and others added 17 commits April 9, 2026 23:47
…guageNote types

Remove the `language` field from UserRequirements and all zh-CN/en-US
hardcoded conditionals in prompt construction. Language will be inferred
by the LLM during outline generation (handled in subsequent tasks).

- Remove language toggle from GenerationToolbar and homepage form
- Remove normalizeLanguage helper and language-based prompt branching
- Standardize formatImageDescription/formatImagePlaceholder to English only
- Add `languageDirective` to Stage, `languageNote` to SceneOutline
- Fix generation-preview references to the removed requirements.language

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change LLM output format from flat JSON array to wrapper object
{ languageDirective, outlines }. Add language inference instructions
to system prompt with signal priority and examples. Replace hardcoded
Course Language section in user prompt with Language Context for
inference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update generateSceneOutlinesFromRequirements to return a wrapper
object { languageDirective, outlines } instead of a flat SceneOutline[].
Parse the new LLM response format with backward compatibility for old
flat-array responses. Add pdfLanguageSample template variable for
language inference in the prompt.

Note: downstream callers (classroom-generation.ts, pipeline-runner.ts)
have expected type errors that will be fixed in subsequent tasks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the scene-outlines-stream API to support the new wrapper
response format { languageDirective, outlines: [...] }:
- Add pdfLanguageSample template variable to the prompt
- Add extractLanguageDirective() to parse directive from partial JSON
- Update extractNewOutlines() to handle nested "outlines" array key
- Emit languageDirective SSE event as soon as it's parsed
- Include languageDirective in the done event payload

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ents

Reorders the server-side generation pipeline so outline generation
(which now infers languageDirective) happens before agent profile
generation. This lets agent names/personas follow the inferred language.

Pipeline order: web search → outlines → agents → scenes

Also threads languageDirective through to generateSceneContent and
generateSceneActions (those functions don't accept the param yet —
that's Tasks 8/9).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and agent profiles API

- Agent profiles API accepts languageDirective instead of language
- Scene content generation accepts and passes languageDirective to prompts
- Scene actions generation accepts and passes languageDirective to prompts
- All prompt templates updated with {{languageDirective}} variable
- Fix pipeline-runner.ts for new outline return type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorder the generation-preview page so outlines are generated before
agent profiles, enabling the languageDirective inferred from outlines
to flow into agent generation, scene content, and scene actions.

- Swap outline and agent-generation step order in ALL_STEPS
- Add languageDirective to GenerationSessionState
- Capture languageDirective from outline SSE stream events
- Pass languageDirective + outlines to agent-profiles API
- Pass languageDirective to scene-content and scene-actions APIs
- Store languageDirective in sessionStorage for classroom page
- Update use-scene-generator GenerationParams with languageDirective
- Update classroom page to pass languageDirective to generateRemaining

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 20 curated test cases from production covering:
  - Pure Chinese/English requirements
  - Chinese with English tech terms
  - Foreign language learning (EN→CN, EN→DE, ZH→EN, AR→EN)
  - Cross-language locale mismatch
  - Non-Chinese/English languages (Spanish, German, Arabic)
  - Short/ambiguous requirements
- Uses actual outline system prompt for inference
- LLM-as-judge evaluates against human-verified ground truth
- Results written to outline-language.eval.result.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows running eval tests with different models for inference vs judging:
  EVAL_INFERENCE_MODEL=google/gemini-3-flash-preview
  EVAL_JUDGE_MODEL=openai/gpt-4o-mini

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…guage inference

- Add special case rules for advanced learners (TEM-8, DALF C1, JLPT N1, etc.)
  who should be taught in the target language, not their native language
- Add example for advanced English learner case
- Remove ambiguous LLC test case (no context to disambiguate)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call generateSceneOutlinesFromRequirements directly instead of a
shortened prompt, so the test exercises the exact same code path
as production. Each case now generates full outlines + languageDirective.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… prompt examples

- Add 3 PDF test cases from production (EN paper + ZH requirement,
  ESL teacher + EN article, ZH C++ syllabus)
- Run tests concurrently with maxConcurrency: 10 (3.5x faster)
- Balance system prompt examples: 3 Chinese + 3 English + 1 Spanish
  (was 3 Chinese + 2 English, causing Chinese bias)
- Add "I want to learn German A1" example to clarify English-user
  foreign language learning should use English instruction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the 7-row examples table to:
- Reduce token consumption (~500 tokens saved per call)
- Eliminate language distribution bias in examples
- Eval results: 21/21 (100%) without examples, same or better than with

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*.eval.test.ts files require real LLM API keys and should only be
run locally via explicit file path, not in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cosarah cosarah force-pushed the feat/language-inline branch from 689b862 to e6d637e Compare April 9, 2026 15:48
cosarah and others added 6 commits April 10, 2026 00:13
The scene-content route was defaulting outline.language to 'zh-CN',
which contradicted the new languageDirective for English courses.
Remove the legacy language parameter from generateInteractiveContent
and use languageDirective as the single source of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pass userProfile template variable in the SSE streaming route so the
  LLM has the student profile signal for language inference (matching
  the non-streaming outline-generator.ts behavior)
- Fix extractLanguageDirective to handle \n and \t escape sequences

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e-runner

- Client SSE handler uses same fallback message as server when
  languageDirective is missing from stream
- pipeline-runner.ts extracts and passes languageDirective to
  generateFullScenes → generateSingleScene → content/actions generators

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace long positional parameter lists with named options objects
(SceneContentOptions, SceneActionsOptions) to eliminate cascading
undefined arguments at call sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expand test cases from 22 to 50 covering language learning, immersive,
  explicit instruction, code-switching, minimal input, user profiles
  (teacher/parent/tutor/heritage/professional), and cross-language PDF
- Rewrite language inference prompt with clear decision rules for
  foreign language learning, cross-language PDF, proxy requests,
  and terminology handling
- Remove redundant pdfLanguageSample (duplicated first 200 chars of
  pdfContent already in Reference Materials section)
- Add vitest.eval.config.ts for running eval tests separately
- 50/50 pass rate with gemini-3-flash-preview + gpt-4o judge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cosarah
Copy link
Copy Markdown
Collaborator Author

cosarah commented Apr 12, 2026

Eval test cases expanded: 22 → 50

New test dimensions added (28 cases):

Language learning (5) — Japanese/Korean learners, non-zh/en axis (ja→zh), multi-target language learning

Immersive learning (2) — Advanced learners requesting full target-language immersion (Japanese→English, Chinese→French)

Explicit language instruction (2) — User explicitly overrides default language ("请用英文教我", "explain in Chinese please")

Code-switching & bilingual (2) — Mixed zh/en input, explicit bilingual teaching request

Minimal / ambiguous input (2) — Single-word requirement ("微积分"), pinyin romanized input

User profiles (8) — Teacher designing foreign language lesson, parent proxy for IB student, bilingual heritage speaker, professional business English, immigrant integration, tutor with bilingual student, teacher of Chinese-as-foreign-language

Cross-language PDF (7) — English req + Chinese PDF, Chinese req + Japanese/French PDF, Japanese req + English PDF, bilingual PDF, teacher using foreign-language material

Also simplified the language inference prompt (system.md ~80→30 lines, removed redundant pdfLanguageSample). 50/50 with gemini-3-flash-preview + gpt-4o judge.

cosarah and others added 2 commits April 12, 2026 21:55
- Remove duplicate early agent resolution block (keep post-outline
  version that uses languageDirective instead of lang)
- Adapt eval test to async resolveModel signature from main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wyuc wyuc closed this Apr 12, 2026
@cosarah cosarah deleted the feat/language-inline branch April 12, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants