-
Notifications
You must be signed in to change notification settings - Fork 889
Description
What happened:
When operating many GameServers in a single cluster, allocation latency can occur. In this situation, if the client sending the allocation request terminates due to a timeout, the agones-allocator does not stop the allocation process. Since requests are put into a queue and processed, once timeouts start piling up, the situation gets worse - the allocator continues processing requests that clients have already given up on, causing a cascading effect.
What you expected to happen:
When a client cancels an allocation request (due to timeout or explicit cancellation):
- The request should not be added to the queue, OR
- If already in the queue, the allocation logic should be skipped when processing it
How to reproduce it (as minimally and precisely as possible):
- Deploy Agones with many GameServers (enough to cause allocation latency)
- Send multiple allocation requests with short timeouts
- Observe that the allocator continues processing requests even after clients have timed out
- Notice that timed-out requests accumulate in the queue, worsening the overall latency
Anything else we need to know?:
Root cause: In cmd/allocator/main.go, the gRPC request context is not passed to the allocator in the standard (non-ProcessorAllocator) path. The workerCtx (server lifecycle context) is captured in a closure instead of using the per-request context.
// line 521-523 in newServiceHandler
allocationCallback: func(gsa *allocationv1.GameServerAllocation) (k8sruntime.Object, error) {
return allocator.Allocate(ctx, gsa) // ctx here is workerCtx, not request ctx
},
// line 755 in Allocate method
resultObj, err := h.allocationCallback(gsa) // request ctx is not passed
Note: The ProcessorAllocator path correctly passes the request context (line 742), but the standard path does not.
Environment:
- Agones version: 1.45.0 (but it's the same as main also)
- Kubernetes version (use
kubectl version): 1.31 - Cloud provider or hardware configuration:
- Install method (yaml/helm):
- Troubleshooting guide log(s):
- Others: