Skip to content

Flushing breaks when log subscription is not available. #386

Open
@JonasKunz

Description

@JonasKunz

APM AWS Lambda extensiom version (latest (v1.3.1)):

Describe the bug

I’m currently playing around with the lambda extension + java agent + aws lambda runtime interface emulator in an attempt to create a lambda integration test for verifying the agent <-> lambda extension interaction.
This is working nicely so far (lambda invocation from my locally running container appear as lambda transaction in my elastic cloud APM), with one exception:

Even though I’ve set ELASTIC_APM_SEND_STRATEGY to syncflush , the transactions for invocations are always buffered until another lambda invocation is made. On shutdown, the transaction for the last invocation is not sent and therefore lost.

I have the suspicion that this behaviour is related to the edge case, that RIE does not provide a proper extension API for logs, as the extension is complaining about a "subscription failure" due to logs. Nonetheless I think the lambda extension should work properly even if for some reason the logging subscription is not working correctly.

Where did I get that suspicion from?

Excerpt from Logs
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":144},"message":"Received event.","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":145},"message":"{\n\t\"timestamp\": \"2023-04-20T11:43:15.069206006Z\",\n\t\"eventType\": \"INVOKE\",\n\t\"deadlineMs\": 1681991055068,\n\t\"requestId\": \"186ea9c1-da69-4e16-bc89-1d6f6a6b110a\",\n\t\"invokedFunctionArn\": \"arn:aws:lambda:us-east-1:012345678912:function:local-test-lambda\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":206},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":230},"message":"Received runtimeDone signal","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":58},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":81},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":113},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":127},"message":"Waiting for next event...","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.306Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.313Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
...
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.363Z","log.origin":{"file.name":"app/run.go","file.line":144},"message":"Received event.","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":145},"message":"{\n\t\"timestamp\": \"2023-04-20T11:43:19.363927758Z\",\n\t\"eventType\": \"INVOKE\",\n\t\"deadlineMs\": 1681991059357,\n\t\"requestId\": \"af63ba87-73c1-4b53-b769-1cddf745e7ec\",\n\t\"invokedFunctionArn\": \"arn:aws:lambda:us-east-1:012345678912:function:local-test-lambda\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":206},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":228},"message":"APM client has sent flush signal","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":58},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":81},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.365Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":169},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.374Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.379Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":270},"message":"APM server Transport status set to Healthy","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":113},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"app/run.go","file.line":127},"message":"Waiting for next event...","ecs.version":"1.6.0"}

The logs show that the extension performs a flush before the data from the agent is actually received.
The same behaviour happens on the next lambda invocation, except that now at least the data from the previous invocation is flushed.

With my non-existent Go skill I would deduce that this line might be the problem: In case the logging subscription failed, processEvents() will finish here immediately.

To Reproduce

  1. Create a docker image with a minimal lambda and the extension + any APM agent:
FROM docker.elastic.co/observability/apm-lambda-extension-arm64:latest AS lambda-extension
FROM docker.elastic.co/observability/apm-agent-java:latest AS java-agent

FROM public.ecr.aws/lambda/java:11

ENV AWS_REGION=eu-central-1
ENV AWS_LAMBDA_FUNCTION_NAME=local-test-lambda
ENV AWS_LAMBDA_FUNCTION_MEMORY_SIZE=1024
#timeout in seconds
ENV AWS_LAMBDA_FUNCTION_TIMEOUT=60

ENV AWS_LAMBDA_EXEC_WRAPPER=/opt/elastic-apm-handler
ENV ELASTIC_APM_LAMBDA_APM_SERVER=https://XXXXXXXXa.apm.europe-west3.gcp.cloud.es.io:443
ENV ELASTIC_APM_SECRET_TOKEN=XXXXXXXXXX
ENV ELASTIC_APM_SEND_STRATEGY=syncflush
ENV ELASTIC_APM_LOG_LEVEL=DEBUG

COPY --from=lambda-extension /opt/elastic-apm-extension /opt/extensions/elastic-apm-extension
COPY --from=java-agent /usr/agent/ /opt/

# usual lambda entrypoint setup:
COPY target/LambdaPlayground.jar ${LAMBDA_TASK_ROOT}/lib/LambdaPlayground.jar

CMD [ "co.elastic.playground.TestLambda" ]
  1. Run the container and expose the REI port: docker run -p 9000:8080 <image-name>

  2. Invoke the lambda function locally: curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"foo" : "bar"}'
    Expected behavior
    Syncflush should work as expected, even when the log subscription is not available (a transaction is flushed immediately after finishing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    aws-λ-extensionAWS Lambda ExtensionbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions