Skip to content

fix: add timeout to ACP handshake to prevent gemini e2e hangs#109

Merged
teng-lin merged 2 commits intomainfrom
fix/gemini-e2e-live-test
Feb 21, 2026
Merged

fix: add timeout to ACP handshake to prevent gemini e2e hangs#109
teng-lin merged 2 commits intomainfrom
fix/gemini-e2e-live-test

Conversation

@teng-lin
Copy link
Copy Markdown
Owner

Summary

  • waitForResponse in AcpAdapter.connect() had no timeout, causing concurrent Gemini sessions to hang indefinitely when the second process stalled acquiring proper-lockfile on ~/.gemini/projects.json during config.initialize()
  • Add optional timeoutMs to waitForResponse; fires after the configured delay and rejects with a clear error
  • Kill the child process (SIGTERM → SIGKILL after 5s) when the handshake fails or times out, preventing zombie subprocesses
  • Thread initializeTimeoutMs from ResolvedConfig into adapterOptions at both connectBackend call sites in SessionCoordinator (createSession and the relaunch/reconnect path)

Root Cause

The Gemini CLI uses proper-lockfile on ~/.gemini/projects.json during config.initialize(), which runs synchronously as part of the ACP initialize handshake. When two Gemini processes start concurrently, the second one stalls waiting for the lock (up to LOCK_TIMEOUT_MS = 10000). Since waitForResponse had no timeout, beamcode waited indefinitely — all the way to vitest's 120s global test timeout.

Test Plan

  • "two independent gemini sessions can connect concurrently" — was timing out at 120s, now passes in ~11s
  • "second createSession on same coordinator yields independent session" — was timing out at 120s, now passes in ~10s
  • All 2454 unit tests pass

waitForResponse had no timeout, causing concurrent gemini sessions to
hang indefinitely when the second process stalled acquiring proper-lockfile
on ~/.gemini/projects.json during config.initialize().

- Add optional timeoutMs to waitForResponse; fires after configured delay
  and rejects with a clear error message
- Kill the child process (SIGTERM then SIGKILL) when the handshake fails
  or times out, preventing zombie subprocesses
- Thread initializeTimeoutMs from ResolvedConfig into adapterOptions at
  both connectBackend call sites in SessionCoordinator (createSession and
  the relaunch/reconnect path)
- Update behavior test assertion to use objectContaining for adapterOptions
  since it now carries initializeTimeoutMs alongside cwd
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @teng-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where concurrent Gemini sessions could hang indefinitely due to a missing timeout in the ACP handshake process. By introducing a configurable timeout and robust child process termination, it prevents deadlocks caused by file locking contention during initialization, significantly improving the stability and reliability of session management, especially in concurrent environments.

Highlights

  • Handshake Timeout Implementation: Implemented an optional 'timeoutMs' parameter in the 'waitForResponse' function within 'AcpAdapter.connect()', which rejects with a clear error if the timeout is exceeded.
  • Zombie Process Prevention: Added logic to kill the child process (SIGTERM, then SIGKILL after 5s) if the ACP handshake fails or times out, preventing orphaned subprocesses.
  • Timeout Configuration Propagation: Threaded the 'initializeTimeoutMs' from 'ResolvedConfig' into 'adapterOptions' at both 'connectBackend' call sites in 'SessionCoordinator' to ensure the timeout is applied.
Changelog
  • src/adapters/acp/acp-adapter.ts
    • Added 'initializeTimeoutMs' parameter to the 'waitForResponse' function signature.
    • Wrapped the ACP handshake logic in a 'try-catch' block to handle errors and timeouts.
    • Implemented child process termination (SIGTERM then SIGKILL) in the catch block to prevent zombie processes.
    • Passed the new 'initializeTimeoutMs' to both 'waitForResponse' calls during the handshake.
    • Added timeout mechanism to 'waitForResponse' using 'setTimeout' and 'clearTimeout'.
  • src/core/session-coordinator.behavior.test.ts
    • Updated 'expect.objectContaining' for 'adapterOptions' in a test to accommodate additional properties.
  • src/core/session-coordinator.ts
    • Passed 'initializeTimeoutMs' from 'this.config' to 'adapterOptions' in 'connectBackend' when creating a new session.
    • Passed 'initializeTimeoutMs' from 'this.config' to 'adapterOptions' in 'connectBackend' during session relaunch/reconnection.
Activity
  • The test 'two independent gemini sessions can connect concurrently' now passes in approximately 11 seconds, resolving a previous 120-second timeout.
  • The test 'second createSession on same coordinator yields independent session' now passes in approximately 10 seconds, resolving a previous 120-second timeout.
  • All 2454 unit tests are confirmed to pass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the hang issue by introducing a timeout to the ACP handshake and ensuring child processes are cleaned up on failure. The implementation is solid, and the suggestions to use the configured killGracePeriodMs and properly clear the SIGKILL timer further improve maintainability and robustness, aligning with best practices for process management.

const args = (options.adapterOptions?.args as string[]) ?? [];
const cwd = options.adapterOptions?.cwd as string | undefined;
const tracer = options.adapterOptions?.tracer as MessageTracer | undefined;
const initializeTimeoutMs = options.adapterOptions?.initializeTimeoutMs as number | undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider extracting killGracePeriodMs from adapterOptions as well. This allows the adapter to respect the global configuration for process termination grace periods, rather than relying on a hardcoded value in the error handler.

Suggested change
const initializeTimeoutMs = options.adapterOptions?.initializeTimeoutMs as number | undefined;
const initializeTimeoutMs = options.adapterOptions?.initializeTimeoutMs as number | undefined;
const killGracePeriodMs = (options.adapterOptions?.killGracePeriodMs as number) ?? 5000;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — is now extracted from with a fallback of 5000.

Comment on lines +110 to +112
child.kill("SIGTERM");
const killTimer = setTimeout(() => child.kill("SIGKILL"), 5000);
killTimer.unref();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to clear the killTimer if the child process exits before the timeout. This prevents sending a SIGKILL to a potentially reused PID. Additionally, using the extracted killGracePeriodMs ensures consistency with the system configuration.

Suggested change
child.kill("SIGTERM");
const killTimer = setTimeout(() => child.kill("SIGKILL"), 5000);
killTimer.unref();
child.kill("SIGTERM");
const killTimer = setTimeout(() => child.kill("SIGKILL"), killGracePeriodMs);
child.once("exit", () => clearTimeout(killTimer));
killTimer.unref();

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added child.once('exit', () => clearTimeout(killTimer)) and switched to killGracePeriodMs.

Comment thread src/core/session-coordinator.ts Outdated
try {
await this.bridge.connectBackend(sessionId, {
adapterOptions: { cwd },
adapterOptions: { cwd, initializeTimeoutMs: this.config.initializeTimeoutMs },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Pass killGracePeriodMs to the adapter options to ensure the backend adapter respects the configured grace period during cleanup on connection failure.

Suggested change
adapterOptions: { cwd, initializeTimeoutMs: this.config.initializeTimeoutMs },
adapterOptions: { cwd, initializeTimeoutMs: this.config.initializeTimeoutMs, killGracePeriodMs: this.config.killGracePeriodMs },

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in both createSession and the reconnect path.

Comment thread src/core/session-coordinator.ts Outdated
try {
await this.bridge.connectBackend(sessionId, {
adapterOptions: { cwd: info.cwd },
adapterOptions: { cwd: info.cwd, initializeTimeoutMs: this.config.initializeTimeoutMs },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Pass killGracePeriodMs here as well to maintain consistency across all connection paths.

Suggested change
adapterOptions: { cwd: info.cwd, initializeTimeoutMs: this.config.initializeTimeoutMs },
adapterOptions: { cwd: info.cwd, initializeTimeoutMs: this.config.initializeTimeoutMs, killGracePeriodMs: this.config.killGracePeriodMs },

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — both connectBackend call sites now carry killGracePeriodMs.

…r timer on early exit

- Extract killGracePeriodMs from adapterOptions (defaulting to 5000) so
  the adapter respects the system-configured grace period rather than a
  hardcoded value
- Clear the SIGKILL timer if the child exits before it fires, preventing
  a potential SIGKILL to a reused PID
- Thread killGracePeriodMs alongside initializeTimeoutMs in both
  connectBackend call sites in SessionCoordinator
@teng-lin teng-lin merged commit b150b4b into main Feb 21, 2026
7 checks passed
@teng-lin teng-lin deleted the fix/gemini-e2e-live-test branch February 21, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant