docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15
Open
oguzseran-max wants to merge 17 commits into
Open
docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15oguzseran-max wants to merge 17 commits into
oguzseran-max wants to merge 17 commits into
Conversation
Add a Commands section (run, build, tests, monitor), document the request pipeline, build-dispatch vs work-mode, the self-improvement loop, and the two separate SQLite DBs. Note the applescript_escape injection guard and JARVIS_SKIP_PERMISSIONS / weather env vars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The final uvicorn.run( call was missing its closing paren, causing a SyntaxError that prevented the backend from starting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- track_usage: remove accidental self-nested def that left inp/out undefined in the outer scope (would NameError at runtime) - api_test_* : fix indentation, add missing `from openai import AsyncOpenAI` so the local Ollama (localhost:11434, gemma3:27b) test actually runs - rename endpoint /api/settings/test-anthropic -> /api/settings/test-ollama to reflect that it tests the local LLM; update frontend fetch call - add openai>=1.0.0 to requirements.txt Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- camera.py + frontend/src/camera.ts: on-demand single-frame webcam vision ([ACTION:CAMERA]). The browser captures one JPEG, releases the camera immediately, and the server routes it to Claude vision. Privacy by design — never a continuous feed, nothing recorded. - server.py: wire [ACTION:SENTIMENT] to the kukapay market-sentiment skill via subprocess, with fast-path keyword + LLM-embedded dispatch and a butler-style spoken summary. - Fix _lookup_and_report: synthesize_speech() returns raw mp3 bytes, but the audio was passed unencoded to send_json, which can't serialize bytes and silently failed — so screen/calendar/mail/sentiment lookups wrote to history but never actually spoke. Base64-encode at both send sites. - Tighten the anti-collision guard to suppress only when a NEWER utterance arrives during the lookup, so fast lookups still speak their result. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keep the secrets backup (.env.save) and start_jarvis.sh log/pid dir (.run/) out of the working tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Idempotent helper that starts the backend (:8340) and frontend (:5173) only if not already listening, waits for the frontend, then opens Chrome. Used by the SessionStart auto-start hook. Logs to .run/ (gitignored). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Speak in English, French, or Turkish and JARVIS replies in kind, in a matching
voice. A top-left EN/FR/TR toggle forces the language for recognition, reply,
and TTS — auto-detection proved unreliable on short spoken phrases.
- whisper_service.py: local STT microservice in a dedicated Python 3.12 venv
(faster-whisper has no 3.14 wheels). Decodes the browser's recorded audio via
ffmpeg/av, peak-normalizes it, and transcribes; ?lang= forces a language.
Default model "small"; launched by start_jarvis.sh on :8765.
- frontend/src/audio_capture.ts: mic capture with MediaRecorder (off-thread,
clean audio) + adaptive-VAD utterance segmentation, replacing the language-
locked Web Speech API. ws.ts gains sendBinary for streaming audio.
- EN/FR/TR toggle (index.html + main.ts + style.css) → {type:"set_lang"}.
- server.py: transcribe_audio() client, binary-audio handling in the voice
loop, set_lang control message, per-language Fish voice map (private cloned
French + Turkish voices, MCU English), language-aware generate_response and
synthesize_speech with correct honorifics (monsieur/efendim). Returns the UI
to idle when nothing is understood so the mic never wedges on "thinking".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On launch JARVIS now plays a boot sequence before the live orb: the red HUD
loading video + machine sound, then a red "INITIATING SYSTEM" loading graphic,
with the welcome line spoken in the selected language, fading (music + graphic)
into the orb.
- frontend/public: boot_silent.mp4 (red HUD video, audio stripped) + per-language
audio tracks boot_audio_{en,fr,tr}.mp3 (machine sound + music bed + welcome
line). FR/TR beds were voice-separated (Demucs) and overlaid with the cloned
voices, time-aligned to the English onset (~11.2s).
- index.html/main.ts/style.css: full-screen boot overlay played on first click;
picks the audio track by the active EN/FR/TR language; reveals the red loader
at ~18.6s, fades music + graphic out near the end into the live orb. The
language toggle sits above the overlay so a language can be chosen first.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ortfolio, crypto)
After the boot sequence JARVIS now delivers a spoken morning briefing in the
active language, and opens the live portfolio dashboard in a small window.
- briefing.py: traffic (Google Directions, live ETA home→office), weather
(Open-Meteo daily forecast for clothing advice), portfolio (runs the user's
track.py to refresh prices, parses totals/movers, opens dashboard window).
- gmail_access.py: READ-ONLY Gmail API (gmail.readonly) — total unread +
Primary-category unread with sender/subject so the LLM can flag what truly
needs a reply. OAuth token cached + auto-refreshed (secrets gitignored).
- server.py: _prepare_briefing() gathers all six sources concurrently (each
bounded), composes a butler briefing via Haiku in EN/FR/TR, synthesizes the
audio. Triggered by the post-boot {"type":"briefing"}; a {"type":"briefing_
prefetch"} sent at boot START runs it DURING the ~28s boot so it plays almost
instantly when the boot ends (~5s vs ~25s). Voice command "morning briefing".
- Google Maps API key field added to Settings.
- mail_access.py: honest failure (timeout no longer reported as "inbox clear"),
faster single-call unread count, lightweight get_recent_headers().
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The boot/briefing dashboard window was 500px wide — too narrow for the 8-column table and long position names. Open it at ~1040x760 (clamped to the screen) so all the numbers are readable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two latency fixes so the briefing speaks the moment the boot ends: - briefing.get_sentiment(): fetch the 6 crypto RSS feeds CONCURRENTLY (~1s vs the ~20s sequential subprocess) and score inline; replaces _do_sentiment_lookup in the briefing path. - _prepare_briefing: split the composed briefing into chunks and synthesize the TTS segments CONCURRENTLY (~7s vs ~24s for one long call), delivered as ordered audio the player queues seamlessly. Combined with the boot-time prefetch, the whole briefing (gather + compose + TTS) now completes in ~7s, well inside the ~28s boot — first audio plays ~0.02s after the boot finishes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Briefing now greets by time of day: morning ("Good morning, I hope you slept
well"), daytime ("Hello, welcome back"), evening ("Good evening, I hope you had
a great day") — in the active language.
- Fix the briefing interrupting itself: the mic was started at boot end, so it
transcribed JARVIS's own briefing voice as user input and cut it off. Now the
mic stays off through the whole briefing and starts only once it finishes
speaking (with a 60s safety fallback).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The time-aware greeting prompt contained literal English greetings ("Good
evening", "Hello, welcome back"), which made the model write the whole briefing
in English even with lang=fr/tr (then spoken by the cloned voice — English with
a French accent). Describe the greeting semantically (no English words) and
force the entire briefing into the target language.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The briefing was sent as 3 separate audio messages; if the final chunk decoded just after the previous finished playing, the player's queue briefly emptied, fired "finished", started the mic, and the mic then cut the last (crypto) segment. Concatenate the parallel-synthesized mp3 chunks into ONE audio blob so playback is a single buffer — no inter-segment race, the whole briefing plays. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The post-boot mic fallback fired at 60s, but a full briefing (esp. French) can run ~60s, so it started the mic mid-briefing and cut the final (crypto) segment. Push the fallback to 180s — well beyond any real briefing — so onFinished stays the normal trigger and the briefing always plays to the end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In French/Turkish mode, "what am I wearing"/"look at my screen" triggered the camera/screen vision lookups, whose prompts were hardcoded English — so JARVIS answered in English. Thread the active language into describe_camera and describe_screen (and their server lookups) so they reply in FR/TR. Also harden generate_response's language rule (reply ONLY in the target language, never English/Spanish/Italian, even on garbled transcripts). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fixes for French/Turkish follow-ups that trigger an action (e.g. camera
"what am I wearing"):
- The action ack ("Right away, sir") was hardcoded English; now localized
(Tout de suite, monsieur / Hemen, efendim, etc.).
- _lookup_and_report spoke its result with the default English voice; now it
uses the active language so the description is read by the cloned FR/TR voice,
not the MCU voice.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CLAUDE.mdwith commands, request pipeline, and architecture notesuvicorn.run(call at the end ofserver.py(was aSyntaxErrorpreventing the backend from starting)track_usage()— remove an accidental self-nesteddefthat leftinp/outundefined in the outer scopefrom openai import AsyncOpenAI) and rename/api/settings/test-anthropic→/api/settings/test-ollamato reflect that it tests the local Ollama model; frontend fetch updated to matchopenai>=1.0.0torequirements.txtNotes
docs/...but it also carries the two code fixes that were needed to get the backend running.*-anthropic); only the endpoint path was renamed.Verification
server.pyparses; frontendtsc --noEmitpasses:8340;/api/settings/test-ollamareturns{"valid":true}against local Ollama (gemma3:27b), old path returns 404🤖 Generated with Claude Code