Skip to content

Partial Transactions feature seems to be broken in 1.5.7 #513

@JonasKunz

Description

@JonasKunz

APM AWS Lambda extensiom version (apm-aws-lambda version): 1.5.7

Describe the bugIn our Java APM-Agent lambda integration tests we test the lambda layer in two scenarios in combination with the APM agent:

  • A successful invocation of the lambda
  • A crash during lambda execution (Simulated by calling System.exit(-42), which results in exit code 214).

With 1.5.7 the succesfull invocation test passed, but the the crash scenario fails.
On transaction start, the APM Agent sends a "partial transaction" to the lambda layer. On a crash, that partial transaction is supposed to be turned into an actual, failed transaction and forwarded to the APM-server.

This worked in the integration test in 1.5.5, but doesn't anymore in 1.5.7.

Here are the relevant logs from the test:

Successful crash + transaction report in 1.5.5:

STDERR: 26 Jul 2024 10:58:53,553 [WARNING] (rapid) First fatal error stored in appctx: Runtime.ExitError
STDERR: 26 Jul 2024 10:58:53,553 [WARNING] (rapid) Process runtime-1 exited: exit status 214
STDERR: 26 Jul 2024 10:58:53,554 [ERROR] (rapid) Invoke failed InvokeID=6d17f713-9325-4548-8fdc-ffb13e8ca2ac error=Runtime exited with error: exit status 214
STDERR: 26 Jul 2024 10:58:53,555 [ERROR] (rapid) Invoke DONE failed: Runtime.ExitError
STDERR: 26 Jul 2024 10:58:53,556 [WARNING] (rapid) Reset initiated: ReleaseFail
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":166},"message":"Received event.","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":167},"message":"{\n\t\"timestamp\": \"2024-07-26T10:58:53.558013708Z\",\n\t\"eventType\": \"SHUTDOWN\",\n\t\"shutdownReason\": \"ReleaseFail\",\n\t\"deadlineMs\": 1721991535555,\n\t\"requestId\": \"\",\n\t\"invokedFunctionArn\": \"\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
STDOUT: {"log.level":"warn","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":213},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).processEvent","file.name":"app/run.go","file.line":237},"message":"Received runtimeDone signal","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).Run","file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).ForwardApmData","file.name":"apmproxy/apmserver.go","file.line":59},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"info","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/app.(*App).Run","file.name":"app/run.go","file.line":98},"message":"Exiting due to shutdown event with reason ReleaseFail","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":82},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.558Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":99},"message":"Flush in progress - Processing lambda data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.580Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).PostToApmServer","file.name":"apmproxy/apmserver.go","file.line":172},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Receiving Intake Request  No.1
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Finished Intake Request  No.1
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.663Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).UpdateStatus","file.name":"apmproxy/apmserver.go","file.line":273},"message":"APM server Transport status set to Healthy","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.663Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":114},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":82},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":99},"message":"Flush in progress - Processing lambda data","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.664Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).PostToApmServer","file.name":"apmproxy/apmserver.go","file.line":172},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Receiving Intake Request  No.2
[pool-1-thread-1] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Finished Intake Request  No.2
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.667Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).FlushAPMData","file.name":"apmproxy/apmserver.go","file.line":114},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
STDOUT: {"log.level":"debug","@timestamp":"2024-07-26T10:58:53.667Z","log.origin":{"function":"github.com/elastic/apm-aws-lambda/apmproxy.(*Client).StartReceiver.func1","file.name":"apmproxy/receiver.go","file.line":62},"message":"server closed","ecs.version":"1.6.0"}
STDERR: 26 Jul 2024 10:58:53,669 [INFO] (rapid) Waiting for runtime domain processes termination
STDERR: 26 Jul 2024 10:58:53,683 [INFO] (rapid) Received signal signal=terminated
STDERR: 26 Jul 2024 10:58:53,683 [INFO] (rapid) Shutting down...
STDERR: 26 Jul 2024 10:58:53,684 [WARNING] (rapid) Reset initiated: SandboxTerminated
STDERR: 26 Jul 2024 10:58:53,684 [INFO] (rapid) Waiting for runtime domain processes termination
[main] INFO co.elastic.apm.awslambda.fakeserver.FakeApmServer - Stopping fake APM server on port 31659

Successful crash without transaction being reported in 1.5.7:

STDERR: 26 Jul 2024 10:54:08,800 [WARNING] (rapid) First fatal error stored in appctx: Runtime.ExitError
STDERR: 26 Jul 2024 10:54:08,800 [WARNING] (rapid) Process runtime-1 exited: exit status 214
STDERR: 26 Jul 2024 10:54:08,801 [ERROR] (rapid) Invoke failed error=Runtime exited with error: exit status 214 InvokeID=6dd62f84-c176-4b22-9522-3dd3f9ede290
STDERR: 26 Jul 2024 10:54:08,801 [ERROR] (rapid) Invoke DONE failed: Runtime.ExitError
STDERR: 26 Jul 2024 10:54:08,802 [WARNING] (rapid) Reset initiated: ReleaseFail
STDERR: 26 Jul 2024 10:54:10,805 [WARNING] (rapid) Deadline: the agent extension-elastic-apm-extension-1 did not exit after deadline 2024-07-26 10:54:10.801800757 +0000 UTC m=+7.194149503; Killing it.
STDERR: 26 Jul 2024 10:54:10,806 [INFO] (rapid) Sending SIGKILL to extension-elastic-apm-extension-1(50).
STDERR: 26 Jul 2024 10:54:10,816 [INFO] (rapid) Waiting for runtime domain processes termination
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Received signal signal=terminated
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Shutting down...
STDERR: 26 Jul 2024 10:54:10,852 [WARNING] (rapid) Reset initiated: SandboxTerminated
STDERR: 26 Jul 2024 10:54:10,852 [INFO] (rapid) Waiting for runtime domain processes termination

As visible in the logs, for some reason the extension seems to not react to the crash is killed by the runtime environment after a two second timeout.

To Reproduce

Run the linked integration test. The lambda layer version can be altered in this line.

If desired, I think we could also just create a standalone container to run & curl against to reproduce the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    aws-λ-extensionAWS Lambda ExtensionbugSomething isn't workingcommunityIssues and PRs created by the communitytriageIssues and PRs that need to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions