Diátaxis: reference
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Pipeline
npm run fetch # Fetch new videos via RSS + yt-dlp transcripts
npm run extract # Extract tips from transcripts using Gemini
npm run backfill # Retry fetching transcripts for videos missing them
npm run check-staleness # Validate freshness / dist sync
npm run pipeline # fetch -> extract -> check-staleness
npm run pipeline:deploy # fetch -> backfill -> extract -> dedupe -> embed -> build -> check-staleness -> verify-live
npm run verify-live # Check live site matches local data
# Frontend
npm run dev
npm run build
npm run preview
# Tests
npm test- Live site: https://disney.bound.tips/
- Container: Express server (Dockerfile), port 3000, behind Traefik
- Volumes:
dist/anddata/public/are bind-mounted — pipeline rebuilds go live immediately (no container restart needed) - Health:
https://disney.bound.tips/api/healthreturns tip count, embeddings status, semantic search status - Timer: systemd
disney-tips-pipeline.timer(daily 10 AM ET / America/New_York) - Deploy:
npm run pipeline:deploy(includes verify-live check at the end) - Manual deploy:
npm run build && npm run verify-live - Troubleshooting: If verify-live fails with STALE, check that
dist/is bind-mounted (not baked into image). Rundocker inspect web-disney --format '{{json .Mounts}}'to verify.
This is a batch-first Disney tips aggregator with an Express server:
-
Pipeline (
scripts/) - Runs daily via systemd timer (10 AM ET)ensure-warp.shruns as ExecStartPre — verifies WARP proxy e2e, auto-restarts if broken- Fetches videos from 13 Disney YouTube channels via RSS feeds
- Extracts transcripts via yt-dlp with WARP proxy (SOCKS5 at 127.0.0.1:1080)
- Backfills previously failed transcripts (max 3 retries per video, then skipped permanently)
- Uses Gemini 2.5 Flash Lite to extract structured tips with priority/season metadata
- Generates OpenAI embeddings (256-dim,
text-embedding-3-small) for client-side vector search - Filters out non-Disney content (Universal, generic travel)
- Saves results to
data/, builds static site todist/
-
Server (
server/index.ts) - Express app (port 3000)- Serves static files from
dist/anddata/public/ POST /api/embed-query— Returns 256-dim query vector (LRU cached, 1K entries)POST /api/search— Server-side semantic search with text fallbackPOST /api/subscribe— Email subscription via ResendGET /api/health— Health check
- Serves static files from
-
Frontend (
src/,index.html) - Vite static site- Client-side filtering by category, park, priority, season, and search
- Client-side vector search: loads 256-dim embeddings (~6.5MB) on page load, cosine similarity in browser (~5ms for ~3,000 tips). Falls back to
/api/searchif embeddings not yet loaded. - Disney-themed UI with castle gradient header
shared/types.ts # Shared types (pipeline + frontend)
scripts/
fetch-videos.ts # RSS feed parsing + yt-dlp transcript fetching
extract-tips.ts # Gemini-powered tip extraction
embed-tips.ts # OpenAI embeddings for semantic search
dedupe-tips.ts # Tip deduplication
backfill-transcripts.ts # Retry missing transcripts (max 3 retries then skip)
ensure-warp.sh # Pre-pipeline WARP proxy health check + auto-restart
check-staleness.ts # Freshness + dist sync checks
verify-live.ts # Post-deploy live site verification
prerender.ts # Inject tips into static HTML
lib/transcript.ts # Transcript runtime + parser
lib/state.ts # Shared lastUpdated logic
server/index.ts # Express server (search, subscribe, health)
data/
public/ # Bind-mounted into container
tips.json # Extracted structured tips (~3000 tips)
embeddings.json # OpenAI embeddings (256-dim, served to browser)
feed.xml # RSS 2.0 feed
pipeline/ # NOT deployed
videos.json # Raw video metadata + transcripts
processed-videos.json # Ledger of processed videos
src/main.ts, src/styles.css # Frontend application
dist/ # Bind-mounted into container
Tips include:
category: parks, dining, hotels, budget, planning, transportationpark: magic-kingdom, epcot, hollywood-studios, animal-kingdom, disney-springs, water-parks, disneyland, california-adventure, all-parkspriority: high (saves 30+ min/$50+), medium, lowseason: year-round, christmas, halloween, flower-garden, food-wine, festival-arts, summertags: lowercase hyphenated (rope-drop, lightning-lane, quick-service)
ensure-warp.sh → YouTube RSS → yt-dlp + WARP → videos.json → backfill → Gemini API → tips.json → Static Frontend
(pipeline/) (public/) ↑
(daily 10 AM ET) (dist/ served)
Stored in .env.local:
# Required
GEMINI_API_KEY= # Google AI Studio API key for tip extraction
OPENAI_API_KEY= # OpenAI API key for embeddings (semantic search)
# Optional
GEMINI_MODEL= # Override model (default: gemini-2.5-flash-lite)
RESEND_API_KEY= # Resend API key (email subscriptions)
RESEND_AUDIENCE_ID= # Resend audience ID
SITE_URL= # Site URL for RSS feed (default: https://disney.bound.tips)
PORT= # Server port (default: 3000)
# Transcript runtime (optional overrides)
WARP_PROXY_HOST=127.0.0.1
WARP_PROXY_PORT=1080
DENO_PATH=~/.deno/bin/deno
TRANSCRIPT_STRICT_PREFLIGHT=true # fail pipeline if preflight warns/fails
TRANSCRIPT_USE_DENO_RUNTIME=true
TRANSCRIPT_TIMEOUT_MS=30000Transcripts are fetched via yt-dlp using a WARP SOCKS5 proxy (127.0.0.1:1080) and Deno runtime (~/.deno/bin/deno). No API tokens needed for transcript fetching.
- Run
yt-dlp --versionto verify binary availability. - Verify proxy e2e:
curl -x socks5://127.0.0.1:1080 -s -o /dev/null -w '%{http_code}' https://www.youtube.com/robots.txt(should return 200). - Verify Deno path exists:
ls ~/.deno/bin/deno. - If WARP is unhealthy:
cd ~/warp-proxy && docker compose restart warp(ensure-warp.sh does this automatically before each pipeline run). - If the proxy is temporarily down and you want best-effort mode, set
TRANSCRIPT_STRICT_PREFLIGHT=false. - In strict mode, failed preflight exits with non-zero status and stops pipeline chaining.
- Videos that fail transcript fetch 3 times are permanently skipped (tracked via
transcriptRetriesin videos.json). Most are shorts/live streams without subtitles.
Edit shared/types.ts:
export const DISNEY_CHANNELS = {
'ChannelName': 'UC_CHANNEL_ID',
// ...
};Get channel ID from: youtube.com/channel/UC... (the UC... part)