Skip to content

Flaky Test: TestGameServerAllocationReturnLabels #4431

@markmandel

Description

@markmandel

Description

TestGameServerAllocationReturnLabels is flaky and fails intermittently in e2e tests on GKE Autopilot with a nil pointer dereference panic.

Environment

Error

--- FAIL: TestGameServerAllocationReturnLabels (112.92s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2c336c3]

goroutine 643 [running]:
testing.tRunner.func1.2({0x2f40480, 0x4948e50})
	/usr/local/go/src/testing/testing.go:1872 +0x419
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1875 +0x683
panic({0x2f40480?, 0x4948e50?})
	/usr/local/go/src/runtime/panic.go:783 +0x132
agones.dev/agones/test/e2e.TestGameServerAllocationReturnLabels(0xc00055cc40)
	/go/src/agones.dev/agones/test/e2e/gameserverallocation_test.go:1368 +0x983
testing.tRunner(0xc00055cc40, 0x33b6ed8)
	/usr/local/go/src/testing/testing.go:1934 +0x21d
created by testing.(*T).Run in goroutine 1
	/usr/local/go/src/testing/testing.go:1997 +0x9d3

Root Cause Analysis

The panic occurs at test/e2e/gameserverallocation_test.go:1368:

assert.Equal(t, t.Name(), gsa.Status.Metadata.Labels[role])

The test creates a Fleet with 1 replica and waits for it to become Ready via AssertFleetCondition. However, on GKE Autopilot, node provisioning can be slow due to scale-from-zero behavior. From the logs, the Fleet spent approximately 110 seconds (from 19:50:41 to 19:52:30) waiting for ReadyReplicas to go from 0 to 1.

The logs show the Fleet stuck at ReadyReplicas:0 while Replicas:1 for an extended period:

time="2026-01-24 19:50:42.290" level=info msg="Checking Fleet Ready replicas" expected=1 fleet=simple-fleet-1.0x7qjj fleetStatus="{Replicas:1 ReadyReplicas:0 ...}"
...
time="2026-01-24 19:52:27.056" level=info msg="Checking Fleet Ready replicas" expected=1 fleet=simple-fleet-1.0x7qjj fleetStatus="{Replicas:1 ReadyReplicas:0 ...}"
time="2026-01-24 19:52:30.251" level=info msg="Checking Fleet Ready replicas" expected=1 fleet=simple-fleet-1.0x7qjj fleetStatus="{Replicas:1 ReadyReplicas:1 ...}"

When AssertFleetCondition eventually passes (or times out), if the allocation happens when no GameServers are actually Ready, the allocation returns an UnAllocated state. In this state, gsa.Status.Metadata is nil, causing the panic when the test tries to access gsa.Status.Metadata.Labels.

Observations

Suggested Fix

The test should verify that gsa.Status.State == GameServerAllocationAllocated before attempting to access gsa.Status.Metadata. The current code uses assert.Equal which doesn't prevent subsequent code from running:

assert.Equal(t, allocationv1.GameServerAllocationAllocated, gsa.Status.State)  // line 1367
assert.Equal(t, t.Name(), gsa.Status.Metadata.Labels[role])  // line 1368 - panics if State != Allocated

Should be changed to use require.Equal for the state check, or add a nil check:

require.Equal(t, allocationv1.GameServerAllocationAllocated, gsa.Status.State)
// Now safe to access gsa.Status.Metadata

Or add explicit nil check:

assert.Equal(t, allocationv1.GameServerAllocationAllocated, gsa.Status.State)
require.NotNil(t, gsa.Status.Metadata, "allocation metadata should not be nil for allocated state")
assert.Equal(t, t.Name(), gsa.Status.Metadata.Labels[role])

Metadata

Metadata

Labels

area/testsUnit tests, e2e tests, anything to make sure things don't breakawaiting-maintainerBlock issues from being stale/obsolete/closedhelp wantedWe would love help on these issues. Please come help us!kind/bugThese are bugs.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions