docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint by oguzseran-max · Pull Request #15 · ethanplusai/jarvis

oguzseran-max · 2026-05-30T19:38:34Z

Summary

docs: expand CLAUDE.md with commands, request pipeline, and architecture notes
fix: close the unterminated uvicorn.run( call at the end of server.py (was a SyntaxError preventing the backend from starting)
fix: repair track_usage() — remove an accidental self-nested def that left inp/out undefined in the outer scope
fix: repair the local-LLM test endpoint (indentation + add from openai import AsyncOpenAI) and rename /api/settings/test-anthropic → /api/settings/test-ollama to reflect that it tests the local Ollama model; frontend fetch updated to match
deps: add openai>=1.0.0 to requirements.txt

Notes

Branch name is docs/... but it also carries the two code fixes that were needed to get the backend running.
The settings UI still labels the field "Anthropic API Key" (element ids *-anthropic); only the endpoint path was renamed.

Verification

server.py parses; frontend tsc --noEmit passes
Backend boots cleanly on :8340; /api/settings/test-ollama returns {"valid":true} against local Ollama (gemma3:27b), old path returns 404

🤖 Generated with Claude Code

Add a Commands section (run, build, tests, monitor), document the request pipeline, build-dispatch vs work-mode, the self-improvement loop, and the two separate SQLite DBs. Note the applescript_escape injection guard and JARVIS_SKIP_PERMISSIONS / weather env vars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The final uvicorn.run( call was missing its closing paren, causing a SyntaxError that prevented the backend from starting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- track_usage: remove accidental self-nested def that left inp/out undefined in the outer scope (would NameError at runtime) - api_test_* : fix indentation, add missing `from openai import AsyncOpenAI` so the local Ollama (localhost:11434, gemma3:27b) test actually runs - rename endpoint /api/settings/test-anthropic -> /api/settings/test-ollama to reflect that it tests the local LLM; update frontend fetch call - add openai>=1.0.0 to requirements.txt Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- camera.py + frontend/src/camera.ts: on-demand single-frame webcam vision ([ACTION:CAMERA]). The browser captures one JPEG, releases the camera immediately, and the server routes it to Claude vision. Privacy by design — never a continuous feed, nothing recorded. - server.py: wire [ACTION:SENTIMENT] to the kukapay market-sentiment skill via subprocess, with fast-path keyword + LLM-embedded dispatch and a butler-style spoken summary. - Fix _lookup_and_report: synthesize_speech() returns raw mp3 bytes, but the audio was passed unencoded to send_json, which can't serialize bytes and silently failed — so screen/calendar/mail/sentiment lookups wrote to history but never actually spoke. Base64-encode at both send sites. - Tighten the anti-collision guard to suppress only when a NEWER utterance arrives during the lookup, so fast lookups still speak their result. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Keep the secrets backup (.env.save) and start_jarvis.sh log/pid dir (.run/) out of the working tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Idempotent helper that starts the backend (:8340) and frontend (:5173) only if not already listening, waits for the frontend, then opens Chrome. Used by the SessionStart auto-start hook. Logs to .run/ (gitignored). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Speak in English, French, or Turkish and JARVIS replies in kind, in a matching voice. A top-left EN/FR/TR toggle forces the language for recognition, reply, and TTS — auto-detection proved unreliable on short spoken phrases. - whisper_service.py: local STT microservice in a dedicated Python 3.12 venv (faster-whisper has no 3.14 wheels). Decodes the browser's recorded audio via ffmpeg/av, peak-normalizes it, and transcribes; ?lang= forces a language. Default model "small"; launched by start_jarvis.sh on :8765. - frontend/src/audio_capture.ts: mic capture with MediaRecorder (off-thread, clean audio) + adaptive-VAD utterance segmentation, replacing the language- locked Web Speech API. ws.ts gains sendBinary for streaming audio. - EN/FR/TR toggle (index.html + main.ts + style.css) → {type:"set_lang"}. - server.py: transcribe_audio() client, binary-audio handling in the voice loop, set_lang control message, per-language Fish voice map (private cloned French + Turkish voices, MCU English), language-aware generate_response and synthesize_speech with correct honorifics (monsieur/efendim). Returns the UI to idle when nothing is understood so the mic never wedges on "thinking". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

On launch JARVIS now plays a boot sequence before the live orb: the red HUD loading video + machine sound, then a red "INITIATING SYSTEM" loading graphic, with the welcome line spoken in the selected language, fading (music + graphic) into the orb. - frontend/public: boot_silent.mp4 (red HUD video, audio stripped) + per-language audio tracks boot_audio_{en,fr,tr}.mp3 (machine sound + music bed + welcome line). FR/TR beds were voice-separated (Demucs) and overlaid with the cloned voices, time-aligned to the English onset (~11.2s). - index.html/main.ts/style.css: full-screen boot overlay played on first click; picks the audio track by the active EN/FR/TR language; reveals the red loader at ~18.6s, fades music + graphic out near the end into the live orb. The language toggle sits above the overlay so a language can be chosen first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ortfolio, crypto) After the boot sequence JARVIS now delivers a spoken morning briefing in the active language, and opens the live portfolio dashboard in a small window. - briefing.py: traffic (Google Directions, live ETA home→office), weather (Open-Meteo daily forecast for clothing advice), portfolio (runs the user's track.py to refresh prices, parses totals/movers, opens dashboard window). - gmail_access.py: READ-ONLY Gmail API (gmail.readonly) — total unread + Primary-category unread with sender/subject so the LLM can flag what truly needs a reply. OAuth token cached + auto-refreshed (secrets gitignored). - server.py: _prepare_briefing() gathers all six sources concurrently (each bounded), composes a butler briefing via Haiku in EN/FR/TR, synthesizes the audio. Triggered by the post-boot {"type":"briefing"}; a {"type":"briefing_ prefetch"} sent at boot START runs it DURING the ~28s boot so it plays almost instantly when the boot ends (~5s vs ~25s). Voice command "morning briefing". - Google Maps API key field added to Settings. - mail_access.py: honest failure (timeout no longer reported as "inbox clear"), faster single-call unread count, lightweight get_recent_headers(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The boot/briefing dashboard window was 500px wide — too narrow for the 8-column table and long position names. Open it at ~1040x760 (clamped to the screen) so all the numbers are readable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two latency fixes so the briefing speaks the moment the boot ends: - briefing.get_sentiment(): fetch the 6 crypto RSS feeds CONCURRENTLY (~1s vs the ~20s sequential subprocess) and score inline; replaces _do_sentiment_lookup in the briefing path. - _prepare_briefing: split the composed briefing into chunks and synthesize the TTS segments CONCURRENTLY (~7s vs ~24s for one long call), delivered as ordered audio the player queues seamlessly. Combined with the boot-time prefetch, the whole briefing (gather + compose + TTS) now completes in ~7s, well inside the ~28s boot — first audio plays ~0.02s after the boot finishes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Briefing now greets by time of day: morning ("Good morning, I hope you slept well"), daytime ("Hello, welcome back"), evening ("Good evening, I hope you had a great day") — in the active language. - Fix the briefing interrupting itself: the mic was started at boot end, so it transcribed JARVIS's own briefing voice as user input and cut it off. Now the mic stays off through the whole briefing and starts only once it finishes speaking (with a 60s safety fallback). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The time-aware greeting prompt contained literal English greetings ("Good evening", "Hello, welcome back"), which made the model write the whole briefing in English even with lang=fr/tr (then spoken by the cloned voice — English with a French accent). Describe the greeting semantically (no English words) and force the entire briefing into the target language. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The briefing was sent as 3 separate audio messages; if the final chunk decoded just after the previous finished playing, the player's queue briefly emptied, fired "finished", started the mic, and the mic then cut the last (crypto) segment. Concatenate the parallel-synthesized mp3 chunks into ONE audio blob so playback is a single buffer — no inter-segment race, the whole briefing plays. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The post-boot mic fallback fired at 60s, but a full briefing (esp. French) can run ~60s, so it started the mic mid-briefing and cut the final (crypto) segment. Push the fallback to 180s — well beyond any real briefing — so onFinished stays the normal trigger and the briefing always plays to the end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

In French/Turkish mode, "what am I wearing"/"look at my screen" triggered the camera/screen vision lookups, whose prompts were hardcoded English — so JARVIS answered in English. Thread the active language into describe_camera and describe_screen (and their server lookups) so they reply in FR/TR. Also harden generate_response's language rule (reply ONLY in the target language, never English/Spanish/Italian, even on garbled transcripts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two fixes for French/Turkish follow-ups that trigger an action (e.g. camera "what am I wearing"): - The action ack ("Right away, sir") was hardcoded English; now localized (Tout de suite, monsieur / Hemen, efendim, etc.). - _lookup_and_report spoke its result with the default English voice; now it uses the active language so the description is read by the cloned FR/TR voice, not the MCU voice. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Oguz and others added 17 commits May 30, 2026 03:02

Fix unclosed uvicorn.run() call at end of server.py

f4884b0

The final uvicorn.run( call was missing its closing paren, causing a SyntaxError that prevented the backend from starting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Ignore .env.save and .run/ runtime artifacts

a79416f

Keep the secrets backup (.env.save) and start_jarvis.sh log/pid dir (.run/) out of the working tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15

docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15
oguzseran-max wants to merge 17 commits into
ethanplusai:mainfrom
oguzseran-max:docs/update-claude-md

oguzseran-max commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oguzseran-max commented May 30, 2026

Summary

Notes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant