fix(telegram): add socket timeout to prevent poll loop from hanging#1398
fix(telegram): add socket timeout to prevent poll loop from hanging#1398tommylin-signalpro wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
tgApi() had no client-side socket timeout. When the TCP connection silently dies (e.g. NAT timeout, ISP routing change, network blip during the 30s long-poll window), the HTTPS request hangs forever and the poll loop never recovers — the bot stops responding until the process is restarted. Add a 60s socket timeout with req.destroy() on the timeout event. The existing poll() catch block already continues to the next iteration, so the loop self-heals after a transient failure. Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
📝 WalkthroughWalkthroughThe changes add a 60-second timeout to Telegram API requests with explicit error handling for timeout events, accompanied by a comprehensive test suite that validates timeout behavior across multiple scenarios including normal responses, stalled connections, and connection refusals. Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant Req as HTTPS Request
participant Server as Telegram API
rect rgba(100, 150, 255, 0.5)
Note over App,Server: Timeout Scenario
App->>Req: POST with 60s timeout
Req->>Server: Send request
Req->>Req: Start timeout timer
Note over Req: No response within 60s
Req->>Req: timeout event fires
Req->>Req: destroy() request
Req-->>App: reject(Error: timeout)
end
rect rgba(100, 255, 150, 0.5)
Note over App,Server: Success Scenario
App->>Req: POST with 60s timeout
Req->>Server: Send request
Req->>Req: Start timeout timer
Server-->>Req: Response received
Req->>Req: Clear timeout
Req-->>App: resolve(response)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/telegram-bridge-timeout.test.js (1)
170-179: Mid-response test assertion is non-detecting.Lines 170-179 currently accept every value the race can produce, so this check won’t catch regressions. Consider asserting a specific expected behavior (or marking this as
it.todoif it is purely documentary).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/telegram-bridge-timeout.test.js` around lines 170 - 179, The current assertion in the test that races tgApi(...) against a timeout accepts all possible outcomes and therefore cannot detect regressions; update the test around the Promise.race (the "result" variable produced from tgApi(baseUrl, "getUpdates", { offset: 0 }, 1000) vs the fallback timeout) to assert a single, explicit expected behavior (for example assert that the call either resolves to "resolved" or consistently rejects to "rejected" depending on intended behavior) or, if the test is only documenting a limitation, replace the body with it.todo (or mark the test as skipped) to avoid a false-positive pass. Ensure you reference and modify the test that uses tgApi and the Promise.race to make the expectation deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/telegram-bridge-timeout.test.js`:
- Around line 50-84: Extract the duplicated tgApi implementation into a single
shared module (exporting function tgApi) and update both the test and the
production code to import that exported function instead of having their own
copies; specifically remove the inline tgApi from
test/telegram-bridge-timeout.test.js and have the test import tgApi from the new
shared module, and change scripts/telegram-bridge.js to import the same tgApi
export so both use the identical implementation.
---
Nitpick comments:
In `@test/telegram-bridge-timeout.test.js`:
- Around line 170-179: The current assertion in the test that races tgApi(...)
against a timeout accepts all possible outcomes and therefore cannot detect
regressions; update the test around the Promise.race (the "result" variable
produced from tgApi(baseUrl, "getUpdates", { offset: 0 }, 1000) vs the fallback
timeout) to assert a single, explicit expected behavior (for example assert that
the call either resolves to "resolved" or consistently rejects to "rejected"
depending on intended behavior) or, if the test is only documenting a
limitation, replace the body with it.todo (or mark the test as skipped) to avoid
a false-positive pass. Ensure you reference and modify the test that uses tgApi
and the Promise.race to make the expectation deterministic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d8946f0f-2413-4fa0-8e69-9a8548587437
📒 Files selected for processing (2)
scripts/telegram-bridge.jstest/telegram-bridge-timeout.test.js
The test validated a copy of tgApi rather than the production code. A follow-up PR will extract tgApi into a shared module so tests import the real implementation. Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
Move the Telegram API client from an inline function in telegram-bridge.js into bin/lib/telegram-api.js. The new module accepts the bot token as a parameter instead of reading it from module-level state, and includes a 60s socket timeout with req.destroy() to prevent the poll loop from hanging on dead TCP connections. telegram-bridge.js now imports and wraps the shared function. Tests in test/telegram-api.test.js import the same production module and verify timeout, recovery, and error handling against local TLS servers. Refs: NVIDIA#1398 Signed-off-by: Tommy Lin <tommylin@signalpro.com.tw>
|
Closing in favor of #1399, which includes the timeout fix along with the module extraction and tests. |
Problem
tgApi()inscripts/telegram-bridge.jshas no client-side socket timeout. When the TCP connection silently dies (e.g. NAT timeout, ISP routing change, network blip during the 30-second long-poll window), the HTTPS request hangs forever and the poll loop never recovers — the bot stops responding until the process is restarted.Root cause:
https.request()is called without atimeoutoption, and there is notimeoutevent handler. A dead TCP connection (no RST/FIN received) causes the request to wait indefinitely.Solution
timeout: 60000(60s) to thehttps.request()options — longer than Telegram's 30s server-side long-poll timeout, so normal responses are unaffected.req.on("timeout", ...)handler that callsreq.destroy()to abort the hung request.poll()catch block already logs the error and schedules the next poll viasetTimeout, so the loop self-heals after a transient failure.Test plan
test/telegram-bridge-timeout.test.js(vitest) with 6 test cases using local TLS servers:Signed-off-by: Tommy Lin tommylin@signalpro.com.tw
Summary by CodeRabbit