Skip to content

Fix race condition with daemonized jailer in startVMM #666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

adamthomason
Copy link

Description of changes:

When using jailer with Daemonize: true, the jailer parent process exits immediately after forking the child firecracker process. This caused a race condition where the SDK's process monitoring goroutine would detect the parent exit and send to errCh before waitForSocket() could establish the socket connection, causing startVMM to fail prematurely.

This change modifies the process monitoring logic to treat a clean exit (status=0) of a daemonized jailer parent as expected behavior rather than an error condition. The goroutine now returns early in this case, allowing waitForSocket() and other initialization logic to complete normally.

The actual firecracker child process continues running in daemon mode and creates the API socket as expected.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

When using jailer with Daemonize: true, the jailer parent process exits
immediately after forking the child firecracker process. This caused a
race condition where the SDK's process monitoring goroutine would detect
the parent exit and send to errCh before waitForSocket() could establish
the socket connection, causing startVMM to fail prematurely.

This change modifies the process monitoring logic to treat a clean exit
(status=0) of a daemonized jailer parent as expected behavior rather than
an error condition. The goroutine now returns early in this case, allowing
waitForSocket() and other initialization logic to complete normally.

The actual firecracker child process continues running in daemon mode and
creates the API socket as expected.

Signed-off-by: Adam Thomason <[email protected]>
@adamthomason adamthomason requested a review from a team as a code owner July 23, 2025 10:17
…rom the Shutdown receiver to trigger startVMM to continue when using daemonized jailer process.

Signed-off-by: Adam Thomason <[email protected]>
@adamthomason
Copy link
Author

adamthomason commented Jul 24, 2025

The initial commit for this PR caused issues with cleanup, as the startVMM cleanup methods were never called due to immediate return. I've implemented an additional "shutdownCh" channel on the Machine struct which is closed when the Shtudown receiver is called.

Copy link
Contributor

@sondavidb sondavidb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the contribution! The changes look fine to me, ideally we'd add this to out testing suite but I can't really think of a good way to test this race condition. (Maybe someone else might have an idea for this?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants