Skip to content

docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15

Open
oguzseran-max wants to merge 17 commits into
ethanplusai:mainfrom
oguzseran-max:docs/update-claude-md
Open

docs: expand CLAUDE.md; fix server.py startup + Ollama test endpoint#15
oguzseran-max wants to merge 17 commits into
ethanplusai:mainfrom
oguzseran-max:docs/update-claude-md

Conversation

@oguzseran-max
Copy link
Copy Markdown

Summary

  • docs: expand CLAUDE.md with commands, request pipeline, and architecture notes
  • fix: close the unterminated uvicorn.run( call at the end of server.py (was a SyntaxError preventing the backend from starting)
  • fix: repair track_usage() — remove an accidental self-nested def that left inp/out undefined in the outer scope
  • fix: repair the local-LLM test endpoint (indentation + add from openai import AsyncOpenAI) and rename /api/settings/test-anthropic/api/settings/test-ollama to reflect that it tests the local Ollama model; frontend fetch updated to match
  • deps: add openai>=1.0.0 to requirements.txt

Notes

  • Branch name is docs/... but it also carries the two code fixes that were needed to get the backend running.
  • The settings UI still labels the field "Anthropic API Key" (element ids *-anthropic); only the endpoint path was renamed.

Verification

  • server.py parses; frontend tsc --noEmit passes
  • Backend boots cleanly on :8340; /api/settings/test-ollama returns {"valid":true} against local Ollama (gemma3:27b), old path returns 404

🤖 Generated with Claude Code

Oguz and others added 17 commits May 30, 2026 03:02
Add a Commands section (run, build, tests, monitor), document the
request pipeline, build-dispatch vs work-mode, the self-improvement
loop, and the two separate SQLite DBs. Note the applescript_escape
injection guard and JARVIS_SKIP_PERMISSIONS / weather env vars.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The final uvicorn.run( call was missing its closing paren, causing a
SyntaxError that prevented the backend from starting.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- track_usage: remove accidental self-nested def that left inp/out
  undefined in the outer scope (would NameError at runtime)
- api_test_* : fix indentation, add missing `from openai import AsyncOpenAI`
  so the local Ollama (localhost:11434, gemma3:27b) test actually runs
- rename endpoint /api/settings/test-anthropic -> /api/settings/test-ollama
  to reflect that it tests the local LLM; update frontend fetch call
- add openai>=1.0.0 to requirements.txt

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- camera.py + frontend/src/camera.ts: on-demand single-frame webcam vision
  ([ACTION:CAMERA]). The browser captures one JPEG, releases the camera
  immediately, and the server routes it to Claude vision. Privacy by design —
  never a continuous feed, nothing recorded.
- server.py: wire [ACTION:SENTIMENT] to the kukapay market-sentiment skill via
  subprocess, with fast-path keyword + LLM-embedded dispatch and a butler-style
  spoken summary.
- Fix _lookup_and_report: synthesize_speech() returns raw mp3 bytes, but the
  audio was passed unencoded to send_json, which can't serialize bytes and
  silently failed — so screen/calendar/mail/sentiment lookups wrote to history
  but never actually spoke. Base64-encode at both send sites.
- Tighten the anti-collision guard to suppress only when a NEWER utterance
  arrives during the lookup, so fast lookups still speak their result.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Keep the secrets backup (.env.save) and start_jarvis.sh log/pid dir (.run/)
out of the working tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Idempotent helper that starts the backend (:8340) and frontend (:5173) only if
not already listening, waits for the frontend, then opens Chrome. Used by the
SessionStart auto-start hook. Logs to .run/ (gitignored).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Speak in English, French, or Turkish and JARVIS replies in kind, in a matching
voice. A top-left EN/FR/TR toggle forces the language for recognition, reply,
and TTS — auto-detection proved unreliable on short spoken phrases.

- whisper_service.py: local STT microservice in a dedicated Python 3.12 venv
  (faster-whisper has no 3.14 wheels). Decodes the browser's recorded audio via
  ffmpeg/av, peak-normalizes it, and transcribes; ?lang= forces a language.
  Default model "small"; launched by start_jarvis.sh on :8765.
- frontend/src/audio_capture.ts: mic capture with MediaRecorder (off-thread,
  clean audio) + adaptive-VAD utterance segmentation, replacing the language-
  locked Web Speech API. ws.ts gains sendBinary for streaming audio.
- EN/FR/TR toggle (index.html + main.ts + style.css) → {type:"set_lang"}.
- server.py: transcribe_audio() client, binary-audio handling in the voice
  loop, set_lang control message, per-language Fish voice map (private cloned
  French + Turkish voices, MCU English), language-aware generate_response and
  synthesize_speech with correct honorifics (monsieur/efendim). Returns the UI
  to idle when nothing is understood so the mic never wedges on "thinking".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On launch JARVIS now plays a boot sequence before the live orb: the red HUD
loading video + machine sound, then a red "INITIATING SYSTEM" loading graphic,
with the welcome line spoken in the selected language, fading (music + graphic)
into the orb.

- frontend/public: boot_silent.mp4 (red HUD video, audio stripped) + per-language
  audio tracks boot_audio_{en,fr,tr}.mp3 (machine sound + music bed + welcome
  line). FR/TR beds were voice-separated (Demucs) and overlaid with the cloned
  voices, time-aligned to the English onset (~11.2s).
- index.html/main.ts/style.css: full-screen boot overlay played on first click;
  picks the audio track by the active EN/FR/TR language; reveals the red loader
  at ~18.6s, fades music + graphic out near the end into the live orb. The
  language toggle sits above the overlay so a language can be chosen first.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ortfolio, crypto)

After the boot sequence JARVIS now delivers a spoken morning briefing in the
active language, and opens the live portfolio dashboard in a small window.

- briefing.py: traffic (Google Directions, live ETA home→office), weather
  (Open-Meteo daily forecast for clothing advice), portfolio (runs the user's
  track.py to refresh prices, parses totals/movers, opens dashboard window).
- gmail_access.py: READ-ONLY Gmail API (gmail.readonly) — total unread +
  Primary-category unread with sender/subject so the LLM can flag what truly
  needs a reply. OAuth token cached + auto-refreshed (secrets gitignored).
- server.py: _prepare_briefing() gathers all six sources concurrently (each
  bounded), composes a butler briefing via Haiku in EN/FR/TR, synthesizes the
  audio. Triggered by the post-boot {"type":"briefing"}; a {"type":"briefing_
  prefetch"} sent at boot START runs it DURING the ~28s boot so it plays almost
  instantly when the boot ends (~5s vs ~25s). Voice command "morning briefing".
- Google Maps API key field added to Settings.
- mail_access.py: honest failure (timeout no longer reported as "inbox clear"),
  faster single-call unread count, lightweight get_recent_headers().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The boot/briefing dashboard window was 500px wide — too narrow for the 8-column
table and long position names. Open it at ~1040x760 (clamped to the screen) so
all the numbers are readable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two latency fixes so the briefing speaks the moment the boot ends:
- briefing.get_sentiment(): fetch the 6 crypto RSS feeds CONCURRENTLY (~1s vs
  the ~20s sequential subprocess) and score inline; replaces _do_sentiment_lookup
  in the briefing path.
- _prepare_briefing: split the composed briefing into chunks and synthesize the
  TTS segments CONCURRENTLY (~7s vs ~24s for one long call), delivered as ordered
  audio the player queues seamlessly.

Combined with the boot-time prefetch, the whole briefing (gather + compose + TTS)
now completes in ~7s, well inside the ~28s boot — first audio plays ~0.02s after
the boot finishes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Briefing now greets by time of day: morning ("Good morning, I hope you slept
  well"), daytime ("Hello, welcome back"), evening ("Good evening, I hope you had
  a great day") — in the active language.
- Fix the briefing interrupting itself: the mic was started at boot end, so it
  transcribed JARVIS's own briefing voice as user input and cut it off. Now the
  mic stays off through the whole briefing and starts only once it finishes
  speaking (with a 60s safety fallback).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The time-aware greeting prompt contained literal English greetings ("Good
evening", "Hello, welcome back"), which made the model write the whole briefing
in English even with lang=fr/tr (then spoken by the cloned voice — English with
a French accent). Describe the greeting semantically (no English words) and
force the entire briefing into the target language.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The briefing was sent as 3 separate audio messages; if the final chunk decoded
just after the previous finished playing, the player's queue briefly emptied,
fired "finished", started the mic, and the mic then cut the last (crypto)
segment. Concatenate the parallel-synthesized mp3 chunks into ONE audio blob so
playback is a single buffer — no inter-segment race, the whole briefing plays.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The post-boot mic fallback fired at 60s, but a full briefing (esp. French) can
run ~60s, so it started the mic mid-briefing and cut the final (crypto) segment.
Push the fallback to 180s — well beyond any real briefing — so onFinished stays
the normal trigger and the briefing always plays to the end.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In French/Turkish mode, "what am I wearing"/"look at my screen" triggered the
camera/screen vision lookups, whose prompts were hardcoded English — so JARVIS
answered in English. Thread the active language into describe_camera and
describe_screen (and their server lookups) so they reply in FR/TR. Also harden
generate_response's language rule (reply ONLY in the target language, never
English/Spanish/Italian, even on garbled transcripts).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fixes for French/Turkish follow-ups that trigger an action (e.g. camera
"what am I wearing"):
- The action ack ("Right away, sir") was hardcoded English; now localized
  (Tout de suite, monsieur / Hemen, efendim, etc.).
- _lookup_and_report spoke its result with the default English voice; now it
  uses the active language so the description is read by the cloned FR/TR voice,
  not the MCU voice.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant