Skip to content

Latest commit

 

History

History
49 lines (33 loc) · 2.19 KB

File metadata and controls

49 lines (33 loc) · 2.19 KB

TTS API Decision

Disqualified immediately

API Why out
iSpeech No free tier for production. Terms ban static distribution. Probably dead.
ElevenLabs 10,000 chars/month free limit. One 2500-word PDF = ~15,000 chars. Burned in a single session.
Google Cloud TTS CORS not reliably supported for direct browser calls. API key fully exposed client-side with no safe restriction mechanism.

The real choice: 2 viable options

Option A — Web Speech API (browser-native)

  • Zero cost, zero keys, zero infrastructure
  • Built into Chrome, Firefox, Safari, Edge
  • Unlimited characters
  • Voice choices come from the user's OS (Chrome surfaces 50+ on macOS/Windows)
  • Known bugs to engineer around:
    • Chrome silently stops after ~60 seconds → need a keepalive timer hack
    • Chrome caps utterances at ~200-300 chars → must chunk text into sentences and queue them
    • getVoices() returns empty on first call → must wait for voiceschanged event
  • Mobile support is patchy (iOS Safari quirks, Android varies)

Option B — AWS Polly (via Cognito)

  • 5 million Standard chars/month free (ongoing, not time-limited)
  • Consistent, controlled voice quality across all devices
  • The safe pattern: Cognito Unauthenticated Identity Pool → temporary scoped credentials → no permanent secret in client JS
  • Voices like Matthew, Joanna — decent quality, many languages
  • Cost: Half-day of AWS setup (IAM, Cognito, SDK config). Requires credit card on AWS account.

Cap's recommendation

Ship with Web Speech API. Here's why:

The chunking and keepalive bugs are well-documented and the fixes are ~50 lines of JS. The GitHub Pages constraint is perfectly served — no backend, no keys, no billing account, no rate limits. Voice selection from OS voices is actually a nice feature ("choose your narrator"). This is the right tool for this scope.

AWS Polly as a clearly-labeled future upgrade path — if you ever want production-grade voice consistency, it's documented and doable without changing the app architecture, just swapping the synthesis layer.


Decision

Web Speech API selected. AWS Polly documented as future upgrade path.