Skip to content

Expand versioned E2E coverage #1295

Open
aikuznetsov wants to merge 6 commits into
mainfrom
versioned-e2e-coverage
Open

Expand versioned E2E coverage #1295
aikuznetsov wants to merge 6 commits into
mainfrom
versioned-e2e-coverage

Conversation

@aikuznetsov

@aikuznetsov aikuznetsov commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

This expands versioned E2E coverage around versiond reconciliation and failure-handling behavior, and cleans up the make e2e runner so one command builds, runs, merges JUnit, verifies, and prints a readable summary.

The new tests cover integration paths that were previously easy to miss because most existing E2E cases mutated oracle state only after all services were already running.

Added E2E scenarios:

Startup From Existing Oracle State

  1. Start mock oracle.
  2. Register v1 in oracle while versiond is stopped.
  3. Start versiond.
  4. Wait for /v1/ to become available.
  5. Verify /v1/ returns the expected version prefix.
  6. Verify /healthz reports v1 as running.

Oracle Temporary Failure Keeps Children Running

  1. Register and start a version through oracle
  2. Wait until the version is reachable through versiond.
  3. Toggle mock oracle failure mode so oracle returns 500.
  4. Wait through multiple versiond poll cycles.
  5. Verify the already-running version still serves traffic.

Empty Oracle Response Keeps Children Running

  1. Clear oracle state.
  2. Register and start v1.
  3. Wait until /v1/ is reachable through versiond.
  4. Delete v1 from oracle so oracle returns an empty version list.
  5. Wait through multiple versiond poll cycles.
  6. Verify /v1/ still works.

Failed Same-Version Update Keeps Old Child Running

  1. Register and start v1.
  2. Wait until /v1/ is reachable through versiond.
  3. Re-register v1 with a bad hash.
  4. Wait through multiple versiond poll cycles.
  5. Verify /v1/ still serves traffic.
  6. Verify /healthz still reports v1 as running.

Child Process Crash Recovery

  1. Register and start a version.
  2. Wait until the version is reachable through versiond.
  3. Call the test-only POST /exit endpoint through the proxy.
  4. Verify the version becomes temporarily unavailable.
  5. Wait for process.Manager to restart the child.
  6. Verify the version becomes reachable again.

Added test report summary output via gotestsum + JUnit merge/reporting:

Summary example:

Matrix Test Report
===================
                                                   e2e-junit.xml
                                                   | 
versioned/e2e  
- TestAddVersion                                   /  / Passed
- TestBasicFlow                                    /  / Passed
- TestChildProcessCrashRecovery                    /  / Passed
- TestEmptyOracleResponseKeepsVersionsRunning      /  / Passed
- TestFailedSameVersionUpdateKeepsOldChildRunning  /  / Passed
- TestHashMismatch                                 /  / Passed
- TestHealthEndpoint                               /  / Passed
- TestOracleTemporaryFailureKeepsVersionsRunning   /  / Passed
- TestRegisterStartupVersion                       /  / Passed
- TestRemoveVersion                                /  / Passed
- TestSSEStreaming                                 /  / Passed
- TestStartupFromExistingOracleState               /  / Passed

-------------------------------------------------------------------------------
Test Results:
  Passed       :     12

Safety

Production versiond logic is unchanged. This PR changes the E2E suite, mock oracle, test app, Docker test runner, and Makefile only.

@aikuznetsov aikuznetsov marked this pull request as ready for review June 3, 2026 04:09
@aikuznetsov aikuznetsov changed the title Versioned e2e coverage Expand versioned E2E coverage Jun 3, 2026
@a-kuprin

a-kuprin commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

@aikuznetsov

Copy link
Copy Markdown
Collaborator Author

@aikuznetsov check the tests https://github.com/gonka-ai/gonka/actions/runs/26863149134/job/79221015258?pr=1295

@a-kuprin
This failure does not look related to the current PR changes.
The failing job is Build and Test API Wrapper, which runs from ./decentralized-api. This PR only changes files under versioned/*.

So I’d treat the API wrapper failure as a separate CI/build issue and fix it in a separate PR, rather than mixing it with the versioned E2E coverage changes.

@a-kuprin

a-kuprin commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

This failure does not look related to the current PR changes.

@aikuznetsov yes it is.
The test failed because of timing/race.

There is already fix for it in release branch #1289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants