-
Notifications
You must be signed in to change notification settings - Fork 32
Description
APM AWS Lambda extensiom version (apm-aws-lambda version
): 1.5.7
Describe the bugIn our Java APM-Agent lambda integration tests we test the lambda layer in two scenarios in combination with the APM agent:
- A successful invocation of the lambda
- A crash during lambda execution (Simulated by calling
System.exit(-42)
, which results in exit code 214).
With 1.5.7
the succesfull invocation test passed, but the the crash scenario fails.
On transaction start, the APM Agent sends a "partial transaction" to the lambda layer. On a crash, that partial transaction is supposed to be turned into an actual, failed transaction and forwarded to the APM-server.
This worked in the integration test in 1.5.5, but doesn't anymore in 1.5.7.
Here are the relevant logs from the test:
Successful crash + transaction report in 1.5.5:
STDERR: 26 Jul 2024 10:58:53,553 [WARNING] (rapid) First fatal error stored in appctx: Runtime.ExitError
STDERR: 26 Jul 2024 10:58:53,553 [WARNING] (rapid) Process runtime-1 exited: exit status 214
STDERR: 26 Jul 2024 10:58:53,554 [ERROR] (rapid) Invoke failed InvokeID=6d17f713-9325-4548-8fdc-ffb13e8ca2ac error=Runtime exited with error: exit status 214
STDERR: 26 Jul 2024 10:58:53,555 [ERROR] (rapid) Invoke DONE failed: Runtime.ExitError
STDERR: 26 Jul 2024 10:58:53,556 [WARNING] (rapid) Reset initiated: ReleaseFail
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":166},"message":"Received event.","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":167},"message":"{\n\t\"timestamp\": \"2024-07-26T10:58:53.558013708Z\",\n\t\"eventType\": \"SHUTDOWN\",\n\t\"shutdownReason\": \"ReleaseFail\",\n\t\"deadlineMs\": 1721991535555,\n\t\"requestId\": \"\",\n\t\"invokedFunctionArn\": \"\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
STDOUT: {"log.level":"warn","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":213},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":237},"message":"Received runtimeDone signal","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).Run","file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).ForwardApmData","file.name":"apmproxy/apmserver.go","file.line":59},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"info","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).Run","file.name":"app/run.go","file.line":98},"message":"Exiting due to shutdown event with reason ReleaseFail","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":82},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":99},"message":"Flush in progress - Processing lambda data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.580Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).PostToApmServer","file.name":"apmproxy/apmserver.go","file.line":172},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Receiving Intake Request No.1
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Finished Intake Request No.1
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.663Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).UpdateStatus","file.name":"apmproxy/apmserver.go","file.line":273},"message":"APM server Transport status set to Healthy","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.663Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":114},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":82},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":99},"message":"Flush in progress - Processing lambda data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).PostToApmServer","file.name":"apmproxy/apmserver.go","file.line":172},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Receiving Intake Request No.2
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Finished Intake Request No.2
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.667Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":114},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.667Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).StartReceiver.func1","file.name":"apmproxy/receiver.go","file.line":62},"message":"server closed","ecs.version":"1.6.0"}
STDERR: 26 Jul 2024 10:58:53,669 [INFO] (rapid) Waiting for runtime domain processes termination
STDERR: 26 Jul 2024 10:58:53,683 [INFO] (rapid) Received signal signal=terminated
STDERR: 26 Jul 2024 10:58:53,683 [INFO] (rapid) Shutting down...
STDERR: 26 Jul 2024 10:58:53,684 [WARNING] (rapid) Reset initiated: SandboxTerminated
STDERR: 26 Jul 2024 10:58:53,684 [INFO] (rapid) Waiting for runtime domain processes termination
[main] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Stopping fake APM server on port 31659
Successful crash without transaction being reported in 1.5.7:
STDERR: 26 Jul 2024 10:54:08,800 [WARNING] (rapid) First fatal error stored in appctx: Runtime.ExitError
STDERR: 26 Jul 2024 10:54:08,800 [WARNING] (rapid) Process runtime-1 exited: exit status 214
STDERR: 26 Jul 2024 10:54:08,801 [ERROR] (rapid) Invoke failed error=Runtime exited with error: exit status 214 InvokeID=6dd62f84-c176-4b22-9522-3dd3f9ede290
STDERR: 26 Jul 2024 10:54:08,801 [ERROR] (rapid) Invoke DONE failed: Runtime.ExitError
STDERR: 26 Jul 2024 10:54:08,802 [WARNING] (rapid) Reset initiated: ReleaseFail
STDERR: 26 Jul 2024 10:54:10,805 [WARNING] (rapid) Deadline: the agent extension-elastic-apm-extension-1 did not exit after deadline 2024-07-26 10:54:10.801800757 +0000 UTC m=+7.194149503; Killing it.
STDERR: 26 Jul 2024 10:54:10,806 [INFO] (rapid) Sending SIGKILL to extension-elastic-apm-extension-1(50).
STDERR: 26 Jul 2024 10:54:10,816 [INFO] (rapid) Waiting for runtime domain processes termination
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Received signal signal=terminated
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Shutting down...
STDERR: 26 Jul 2024 10:54:10,852 [WARNING] (rapid) Reset initiated: SandboxTerminated
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Waiting for runtime domain processes termination
As visible in the logs, for some reason the extension seems to not react to the crash is killed by the runtime environment after a two second timeout.
To Reproduce
Run the linked integration test. The lambda layer version can be altered in this line.
If desired, I think we could also just create a standalone container to run & curl against to reproduce the issue.