Skip to content

Add real-time /status dashboard with SSE push#322

Open
howard0su wants to merge 7 commits into
Luce-Org:mainfrom
howard0su:status_html
Open

Add real-time /status dashboard with SSE push#322
howard0su wants to merge 7 commits into
Luce-Org:mainfrom
howard0su:status_html

Conversation

@howard0su
Copy link
Copy Markdown
Contributor

Summary

Adds a live server status page at GET /status that shows the current inference state, request details, generated output, and performance history — all pushed in real time via Server-Sent Events.

What it does

  • /status — Serves a self-contained HTML dashboard (dark theme, no CDN deps)
  • /status/events — SSE endpoint pushing event: status (state snapshot) and event: token (incremental output text)
  • /status/json — JSON snapshot for programmatic access

Dashboard shows

Section Details
Phase idle / prefill / decode badge
Request params model, format, temperature, top_p/k, max_output, session_id
Feature tags cache hit, pflash, spec decode, stream, thinking
Live stats prompt tokens, completion tokens, elapsed time, live tok/s
Draft tokens Current spec-decode candidate tokens (updated per step)
Request messages Chat messages JSON (truncated, scrollable)
Response output Token-by-token output accumulated client-side
Perf charts Prefill tok/s and decode tok/s + accept rate (last 50 requests)

howard0su and others added 4 commits May 31, 2026 15:12
Add a real-time server status dashboard accessible at GET /status:
- Serves standalone HTML from server/share/status.html (editable without recompile)
- GET /status/events SSE endpoint pushes live JSON updates per spec-decode step
- GET /status/json provides a snapshot for non-SSE clients

Status tracking (server_status.h):
- Current phase (idle/prefill/decode) with prompt excerpt and token counts
- Draft tokens being verified (updated each spec-decode step)
- Performance history (last 50 requests): prefill tok/s, decode tok/s, accept rate
- RAII StatusGuard ensures status resets to idle on all exit paths

Backend instrumentation (InferenceObserver on DaemonIO):
- Observer callback in model_backend.h, called at each draft/verify step
- Instrumented in qwen35_backend.cpp and generic dflash_spec_decode.cpp
- Zero overhead when no SSE clients are connected (empty std::function check)

Dashboard features (status.html):
- Dark-themed responsive UI with phase badges and live counters
- Draft token display updated per spec-decode step
- SVG-based performance charts (prefill tok/s, decode tok/s, accept rate)
- Auto-reconnecting EventSource with connection status indicator
- No external CDN dependencies

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add CMake POST_BUILD rule to copy share/status.html into build/share/
- Add exe_dir/share/status.html as a search path (build dir layout)
- Keeps existing ../share/ and ./share/ fallbacks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add request params to status: model, format, temperature, top_p/k,
  max_output, thinking_enabled, session_id, cache/pflash/spec_decode flags
- Add incremental 'event: token' SSE events (browser accumulates output)
- Add messages JSON to status event (sent once per request)
- Redesigned HTML: two-column request/response view, params grid, feature
  tags (cache hit, pflash, spec decode, stream, thinking), live tok/s
- All state accumulated client-side; server stays stateless for output text

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Token text can contain partial UTF-8 sequences (tokens split multi-byte
codepoints). Use json::error_handler_t::replace in all dump() calls on
status paths so invalid bytes become U+FFFD instead of throwing
type_error 316.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 issues found across 8 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread server/src/server/http_server.cpp Outdated
Comment thread server/src/server/http_server.cpp
Comment thread server/src/server/http_server.cpp Outdated
Comment thread server/CMakeLists.txt
Comment thread server/share/status.html
Comment thread server/CMakeLists.txt Outdated
Comment thread server/share/status.html
Comment thread server/src/server/http_server.cpp
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record the post-push discovery and merge of PR Luce-Org#322, the conflict resolution with the MTP hook, and updated validation/classification.
howard0su and others added 2 commits May 31, 2026 19:51
- Add /status/json to kApiEndpoints registry
- Replace raw ::send() with sse_try_send() helper that handles partial
  writes via poll loop with a short 1s timeout (avoids stalling worker)
- Add sse_heartbeat() to prune disconnected SSE clients during idle
  periods (worker dequeue uses timed wait, sends heartbeat every 30s)
- Use $<TARGET_FILE_DIR:dflash_server> in CMake POST_BUILD copy rule
  for correct output path with multi-config generators
- Add install(FILES) rule for status.html
- Clear messages panel when a new request starts in the browser
- Use incremental DOM append (createTextNode) for token events instead
  of re-rendering full output text on each token

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a prefix cache hit occurs, the backend only prefills the delta
tokens beyond the cached prefix. The previous calculation divided the
full prompt token count by the delta prefill time, giving either 0
(full cache hit, no delta) or a wildly inflated number (partial hit).

Now uses the actual number of tokens that were prefilled:
- Full cache hit: 0 tok/s (correct — no prefill work done)
- Partial cache hit: delta_tokens / prefill_time
- No cache hit: effective_prompt.size() / prefill_time

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread server/src/server/http_server.cpp
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 06:09 auto-integration refresh, including exact integration of current PR Luce-Org#322 and Luce-Org#294 heads, fresh direct-merge probes for remaining selective-port candidates, and validation results.
Replace blocking sse_try_send() (1s timeout per client) with MSG_DONTWAIT
send in sse_heartbeat(). The 12-byte heartbeat ping will succeed instantly
for any healthy client; slow clients with full buffers are pruned
immediately instead of stalling the worker thread.

This eliminates up to N×1s latency on idle-to-active transitions when
slow SSE clients are connected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Merge advanced status dashboard head 4b40aa1 after post-push enumeration detected the PR moved during the cron run.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record post-push detection and integration of the advanced PR Luce-Org#322 head plus validation for the refreshed stack.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/src/server/http_server.cpp">

<violation number="1" location="server/src/server/http_server.cpp:591">
P1: Heartbeat non-blocking send does not handle partial writes, risking SSE stream corruption</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

for (int fd : sse_fds_) {
// Non-blocking send: if the socket buffer can't accept 12 bytes
// immediately, the client is too far behind — treat as dead.
ssize_t n = ::send(fd, ping, sizeof(ping) - 1, MSG_NOSIGNAL | MSG_DONTWAIT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Heartbeat non-blocking send does not handle partial writes, risking SSE stream corruption

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/http_server.cpp, line 591:

<comment>Heartbeat non-blocking send does not handle partial writes, risking SSE stream corruption</comment>

<file context>
@@ -580,12 +580,16 @@ void HttpServer::broadcast_token(const std::string & text) {
-        if (!sse_try_send(fd, ping, sizeof(ping) - 1)) {
+        // Non-blocking send: if the socket buffer can't accept 12 bytes
+        // immediately, the client is too far behind — treat as dead.
+        ssize_t n = ::send(fd, ping, sizeof(ping) - 1, MSG_NOSIGNAL | MSG_DONTWAIT);
+        if (n <= 0) {
             dead.push_back(fd);
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant