Skip to content

fix(telegram): add socket timeout to prevent poll loop from hanging#1398

Closed
tommylin-signalpro wants to merge 2 commits intoNVIDIA:mainfrom
tommylin-signalpro:fix/telegram-bridge-timeout
Closed

fix(telegram): add socket timeout to prevent poll loop from hanging#1398
tommylin-signalpro wants to merge 2 commits intoNVIDIA:mainfrom
tommylin-signalpro:fix/telegram-bridge-timeout

Conversation

@tommylin-signalpro
Copy link
Copy Markdown
Contributor

@tommylin-signalpro tommylin-signalpro commented Apr 3, 2026

Problem

tgApi() in scripts/telegram-bridge.js has no client-side socket timeout. When the TCP connection silently dies (e.g. NAT timeout, ISP routing change, network blip during the 30-second long-poll window), the HTTPS request hangs forever and the poll loop never recovers — the bot stops responding until the process is restarted.

Root cause: https.request() is called without a timeout option, and there is no timeout event handler. A dead TCP connection (no RST/FIN received) causes the request to wait indefinitely.

Solution

  • Add timeout: 60000 (60s) to the https.request() options — longer than Telegram's 30s server-side long-poll timeout, so normal responses are unaffected.
  • Add a req.on("timeout", ...) handler that calls req.destroy() to abort the hung request.
  • The existing poll() catch block already logs the error and schedules the next poll via setTimeout, so the loop self-heals after a transient failure.

Test plan

  • Added test/telegram-bridge-timeout.test.js (vitest) with 6 test cases using local TLS servers:
    • Normal response still works with timeout in place
    • Server hang → timeout fires and rejects
    • Timeout fires within expected window (precision check)
    • Poll loop recovers after timeout (first call hangs, second succeeds)
    • Server closes connection mid-response (known limitation, documented)
    • Connection refused (ECONNREFUSED) rejects immediately
  • Verified locally with standalone runner (6/6 pass)
  • CI: vitest run

Signed-off-by: Tommy Lin tommylin@signalpro.com.tw

Summary by CodeRabbit

  • Bug Fixes
    • Telegram API requests now timeout gracefully after 60 seconds instead of hanging indefinitely, improving system reliability.

tgApi() had no client-side socket timeout. When the TCP connection
silently dies (e.g. NAT timeout, ISP routing change, network blip
during the 30s long-poll window), the HTTPS request hangs forever
and the poll loop never recovers — the bot stops responding until
the process is restarted.

Add a 60s socket timeout with req.destroy() on the timeout event.
The existing poll() catch block already continues to the next
iteration, so the loop self-heals after a transient failure.

Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

The changes add a 60-second timeout to Telegram API requests with explicit error handling for timeout events, accompanied by a comprehensive test suite that validates timeout behavior across multiple scenarios including normal responses, stalled connections, and connection refusals.

Changes

Cohort / File(s) Summary
Timeout Implementation
scripts/telegram-bridge.js
Added explicit 60-second timeout to HTTPS POST calls with a request timeout handler that destroys the request and rejects with a timeout error.
Timeout Test Suite
test/telegram-bridge-timeout.test.js
New test file with self-signed TLS certificate generation and ephemeral HTTPS server setup, covering scenarios: successful responses, timeout rejection with timing bounds, shortened timeout windows, recovery after timeouts, socket destruction edge cases, and connection refusals.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Req as HTTPS Request
    participant Server as Telegram API

    rect rgba(100, 150, 255, 0.5)
    Note over App,Server: Timeout Scenario
    App->>Req: POST with 60s timeout
    Req->>Server: Send request
    Req->>Req: Start timeout timer
    Note over Req: No response within 60s
    Req->>Req: timeout event fires
    Req->>Req: destroy() request
    Req-->>App: reject(Error: timeout)
    end

    rect rgba(100, 255, 150, 0.5)
    Note over App,Server: Success Scenario
    App->>Req: POST with 60s timeout
    Req->>Server: Send request
    Req->>Req: Start timeout timer
    Server-->>Req: Response received
    Req->>Req: Clear timeout
    Req-->>App: resolve(response)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A timeout has come, no more waiting around!
Sixty seconds of patience, then destroy() the sound.
With tests on servers that mock the API's call,
Our requests now fail gracefully—no stalling at all! 🕐

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a socket timeout to the telegram-bridge to prevent the poll loop from hanging indefinitely when TCP connections die silently.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/telegram-bridge-timeout.test.js (1)

170-179: Mid-response test assertion is non-detecting.

Lines 170-179 currently accept every value the race can produce, so this check won’t catch regressions. Consider asserting a specific expected behavior (or marking this as it.todo if it is purely documentary).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/telegram-bridge-timeout.test.js` around lines 170 - 179, The current
assertion in the test that races tgApi(...) against a timeout accepts all
possible outcomes and therefore cannot detect regressions; update the test
around the Promise.race (the "result" variable produced from tgApi(baseUrl,
"getUpdates", { offset: 0 }, 1000) vs the fallback timeout) to assert a single,
explicit expected behavior (for example assert that the call either resolves to
"resolved" or consistently rejects to "rejected" depending on intended behavior)
or, if the test is only documenting a limitation, replace the body with it.todo
(or mark the test as skipped) to avoid a false-positive pass. Ensure you
reference and modify the test that uses tgApi and the Promise.race to make the
expectation deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/telegram-bridge-timeout.test.js`:
- Around line 50-84: Extract the duplicated tgApi implementation into a single
shared module (exporting function tgApi) and update both the test and the
production code to import that exported function instead of having their own
copies; specifically remove the inline tgApi from
test/telegram-bridge-timeout.test.js and have the test import tgApi from the new
shared module, and change scripts/telegram-bridge.js to import the same tgApi
export so both use the identical implementation.

---

Nitpick comments:
In `@test/telegram-bridge-timeout.test.js`:
- Around line 170-179: The current assertion in the test that races tgApi(...)
against a timeout accepts all possible outcomes and therefore cannot detect
regressions; update the test around the Promise.race (the "result" variable
produced from tgApi(baseUrl, "getUpdates", { offset: 0 }, 1000) vs the fallback
timeout) to assert a single, explicit expected behavior (for example assert that
the call either resolves to "resolved" or consistently rejects to "rejected"
depending on intended behavior) or, if the test is only documenting a
limitation, replace the body with it.todo (or mark the test as skipped) to avoid
a false-positive pass. Ensure you reference and modify the test that uses tgApi
and the Promise.race to make the expectation deterministic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d8946f0f-2413-4fa0-8e69-9a8548587437

📥 Commits

Reviewing files that changed from the base of the PR and between aae3ccd and 7599c5a.

📒 Files selected for processing (2)
  • scripts/telegram-bridge.js
  • test/telegram-bridge-timeout.test.js

The test validated a copy of tgApi rather than the production code.
A follow-up PR will extract tgApi into a shared module so tests
import the real implementation.

Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
tommylin-signalpro added a commit to tommylin-signalpro/NemoClaw that referenced this pull request Apr 3, 2026
Move the Telegram API client from an inline function in
telegram-bridge.js into bin/lib/telegram-api.js. The new module
accepts the bot token as a parameter instead of reading it from
module-level state, and includes a 60s socket timeout with
req.destroy() to prevent the poll loop from hanging on dead TCP
connections.

telegram-bridge.js now imports and wraps the shared function.
Tests in test/telegram-api.test.js import the same production
module and verify timeout, recovery, and error handling against
local TLS servers.

Refs: NVIDIA#1398

Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
@tommylin-signalpro
Copy link
Copy Markdown
Contributor Author

Closing in favor of #1399, which includes the timeout fix along with the module extraction and tests.

@wscurran wscurran added bug Something isn't working Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants