-
Translation Flow
- Uses Paraglide for UI string management
- Git-based caching of translations
- Two-pass translation (initial + review)
- Heavy preprocessing/postprocessing of markdown
- Serial Git operations with queue
-
Code Structure
// Current translation flow async function translate(...) { const processed = preprocessMarkdown(content) const firstPass = await postChatCompletion([...]) const reviewed = await postChatCompletion([..., reviewPrompt]) return postprocessMarkdown(processed, reviewed) } // Current caching approach const cacheLatestCommitDate = cacheLatestCommitDates.get(path) if (cacheLatestCommitDate > sourceLatestCommitDate) { useCachedTranslation = true }
-
Proposed Changes
// New translation request caching interface TranslationRequest { source: string prompt: string model: string timestamp: string } // New translation flow with comparison async function translate(...) { const request = { source: content, prompt: promptGenerator(languageName, content, promptAdditions), model: LLM_MODEL, timestamp: new Date().toISOString() } const oldResult = await checkOldCache(...) const newResult = await getOrCreateNewTranslation(request) if (process.env.NODE_ENV === 'development') { // Surface comparison in preview } return oldResult // Use old system during transition } // Preview comparison component <script lang="ts"> import { diff_match_patch } from 'diff-match-patch' export let translations: Record<string, {old?: string, new: string}> function renderDiff(old: string, newText: string) { const dmp = new diff_match_patch() const diff = dmp.diff_main(old, newText) return // HTML with diff highlighting } </script>
-
Cache Structure
- Moving from translation caching to LLM request caching
- Store full context (source, prompt, model, result)
- Simpler validation approach vs complex preprocessing
- Git-based storage with clean data model
- Focus on auditability and reproducibility
-
Translation Strategy
- Trust capable models (GPT-4/Claude) more
- Reduce preprocessing/postprocessing complexity
- Opt-in validation for special cases
- Whole-page translation approach preferred
- Prompt engineering over code complexity
-
Project Organization
- Independent repository (vs monorepo)
- TypeScript-based implementation
- Build-time integration with pauseai-website
- Cache as version-controlled translation history
-
Transition Approach
- Deploy prototype first for immediate progress
- Build new cache format alongside old system
- Surface translation comparisons in preview site
- One-time switch once validated by native speakers
- Maintain existing review process via Discord
-
Initial Setup Complete
- Basic project structure
- TypeScript configuration
- Documentation framework
- GitHub repository under PauseAI organization
-
Next Development Priorities
- Implement LLM request caching
- Add translation comparison UI
- Simplify markdown handling
- Streamline validation approach
-
Translation Approach
- Trust capable models more
- Reduce preprocessing complexity
- Handle special cases via prompts
- Keep validation lightweight and optional
- Surface comparisons for review
-
Technical Insights
- Cache LLM requests not just results
- Maintain full context for auditability
- Simple validation over complex preservation
- Preview-based comparison workflow
- Focus on native speaker review process
-
Model Selection
- Cost vs quality tradeoffs
- Single vs two-pass approach
- Specific model recommendations pending testing
- Token cost analysis needed
-
Integration Details
- Preview system comparison UI
- Translation metadata handling
- Diff visualization approach
- Monitoring and cost tracking
-
Validation Strategy
- Balance between trust and verification
- Opt-in validation criteria
- Special case handling via prompts
- Native speaker review process
-
Short Term (on paraglide branch)
- Add LLM request caching
- Implement comparison UI
- Test with native speakers
- Once verified, remove comparison and go forward with cached requests
- fold paraglide into mainline at this point modulo any "keep production stable" nuances
-
Medium Term (on pauseai-l10n)
- Complete transition to new cache
- Simplify markdown handling
- Streamline validation
- Document new architecture
-
Long Term
- Optimize model selection
- Enhance prompt engineering
- Scale to more languages
- Consider community feedback system