fix: classify no-usage-stats log by stream completion state by Evrard-Nil · Pull Request #519 · nearai/cloud-api

Evrard-Nil · 2026-03-31T03:39:45Z

Summary

Downgrade "Stream ended but no usage stats" from ERROR to WARN when stream was interrupted (client disconnect or provider error)
Keep ERROR only for streams that completed normally (stream_completed = true) but still lack usage — indicating a real provider bug
Reduces ~12,500 false-positive ERROR logs/day in prod (the #1 error by volume)

Root cause

External providers (OpenAI, Anthropic) only send usage stats in the final SSE chunk. When a client disconnects mid-stream (closes browser, network timeout), the InterceptStream is dropped before receiving the final chunk. The existing code logged ERROR for all cases without checking stream_completed.

Reproduction steps

# 1. Send a streaming request but disconnect after 1 second (simulates browser close)
timeout 1 curl -s -X POST "https://cloud-api.near.ai/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.2",
    "messages": [{"role": "user", "content": "Write a long essay about mathematics"}],
    "max_tokens": 500,
    "stream": true
  }'

# 2. Check Datadog: service:cloud-api env:prod @level:ERROR "Stream ended but no usage"
#    Before fix: ERROR log
#    After fix: WARN log with stream_error=false (client disconnect)

See repro_no_usage_stats.sh (gitignored) for the full reproduction script.

Test plan

cargo check compiles cleanly
All 188 unit tests pass (cargo test --lib --bins)
Deploy to staging and verify:
- Client disconnects produce WARN (not ERROR) in Datadog
- Completed streams with no usage still produce ERROR
- Normal completions still record usage correctly

🤖 Generated with Claude Code

Client disconnects and provider errors are expected to lack usage stats since external providers (OpenAI, Anthropic) only send usage in the final SSE chunk. These now log at WARN level instead of ERROR. Only truly completed streams that lack usage stats remain ERROR, which indicates an actual provider bug worth investigating. Reduces ~12k false-positive ERROR logs/day in prod.

claude · 2026-03-31T03:41:06Z

Review

The fix is correct and well-scoped. The stream_completed flag is set only on clean stream termination (Poll::Ready(None)), and last_error is set on provider-side errors — so the conditional correctly separates expected interruptions (client disconnect, provider error mid-stream) from genuine provider bugs (stream fully completed but no usage stats emitted).

One minor observation:

The (Some(usage), None) arm at line ~194 still logs tracing::error!(... "Stream ended but no chat_id available") unconditionally, without checking stream_completed. If a client disconnects after usage stats are received but before chat_id is set, this would still produce a false-positive ERROR. Consider applying the same pattern there for consistency:

(Some(usage), None) => {
    if !self.stream_completed {
        tracing::warn!(..., "Stream interrupted after receiving usage stats but before chat_id");
    } else {
        tracing::error!(..., "Stream completed but no chat_id available");
    }
    return;
}

This is minor and may be less common in practice, but worth considering for full consistency.

Otherwise the change is clean — good use of structured logging with stream_error = self.last_error.is_some() to distinguish clean disconnects from provider errors within WARN logs.

✅ Approved (minor suggestion above is non-blocking)

gemini-code-assist

Code Review

This pull request updates the .gitignore file to exclude reproduction scripts and refines the logging logic within the completions service. The changes distinguish between expected stream interruptions, such as client disconnects or provider errors, and cases where a stream completes normally but lacks required usage statistics or chat IDs. Interruptions are now logged as warnings instead of errors to reduce noise. I have no feedback to provide as there were no review comments.

Copilot

Pull request overview

Adjusts streaming drop logging to reduce false-positive ERROR volume by classifying “no usage stats” cases based on whether the stream completed normally, aligning operational alerting with expected client disconnect/provider interruption behavior.

Changes:

Downgrade “no usage stats” logs to WARN when stream_completed = false; keep ERROR only when the stream completed but usage is still missing.
Add additional structured field (stream_error) to WARN logs to distinguish provider error vs. client disconnect.
Ignore local reproduction shell scripts via .gitignore.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`crates/services/src/completions/mod.rs`	Classifies missing-usage logs by `stream_completed` and logs WARN vs. ERROR accordingly.
`.gitignore`	Adds an ignore pattern for repro shell scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/services/src/completions/mod.rs

Copilot AI review requested due to automatic review settings March 31, 2026 03:39

Evrard-Nil temporarily deployed to Cloud API test env March 31, 2026 03:39 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Evrard-Nil March 31, 2026 03:40 View session

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

Copilot AI reviewed Mar 31, 2026

View reviewed changes

crates/services/src/completions/mod.rs Show resolved Hide resolved

fix: update stream_completed doc comment per review

7a86b01

Evrard-Nil temporarily deployed to Cloud API test env March 31, 2026 03:47 — with GitHub Actions Inactive

PierreLeGuen self-requested a review March 31, 2026 04:17

PierreLeGuen approved these changes Mar 31, 2026

View reviewed changes

Evrard-Nil merged commit 43b8b0d into main Mar 31, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: classify no-usage-stats log by stream completion state#519

fix: classify no-usage-stats log by stream completion state#519
Evrard-Nil merged 2 commits intomainfrom
fix/no-usage-stats-error-classification

Evrard-Nil commented Mar 31, 2026

Uh oh!

claude bot commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Evrard-Nil commented Mar 31, 2026

Summary

Root cause

Reproduction steps

Test plan

Uh oh!

claude bot commented Mar 31, 2026

Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants