generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
enhancementNew feature or requestNew feature or requestin progressSomeone is actively working on this issueSomeone is actively working on this issue
Description
Description
Currently, if the agent is down or has not started, metrics can be dropped. It's currently up to the caller of logger.flush to handle retries. There are 2 options:
- Backpressure the caller of
logger.flush. This could negatively impact request latencies. - On error, enqueue to a circular buffer. The trick here is we will need to retry this queue on an interval which changes the model from an async/await to a purely async one. This is a departure from the current design and will need to be turned on via feature flag.
The symptoms of this are:
- The first metrics during initialization of the app may not appear
- The following error message will be in your app logs:
(node:1) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 172.17.0.2:25888
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)
Tasks
- Add type
AgentSinkOptionswithRetryStrategyparameter where the default value isNonefor backwards compatibility with a single option to start with:ExponentialBackoffRetryStrategy(see also: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)AsyncBehaviorparameter that controls whether the call should block or not. In the former case we keep the current behavior and in the latter we return immediately, enqueuing to the retry buffer on failure.
- Change AgentSink's constructor to
constructor(options: AgentSinkOptions, ISerializer: serializer). - Add RetryStrategies which the AgentSink uses based on its configuration.
NoRetrypropagates errors back to the caller offlushwhich maintains current behavior today.ExponentialRetry(which can be configured by the application) will blockflushon the first attempt, enqueuing to aCircularBuffer(whose size is also configurable) on failures. - On startup,
setIntervalwill be set to check the size of theCircularBufferand retry failed requests asynchronously. - Add
shutdownmethod to gracefully shutdown and block on any outstanding requests.
Example Usage
AWS_EMF_AGENT_RETRY_STRATEGY="ExponentialBackoff"
// or
Configuration.agentRetryStrategy = RetryStrategy.ExponentialBackoff;
// or
Configuration.agentRetryStrategy = (...) => customRetryStratgy();
// ...
await logger.flush();
// execution control is returned when logs have been successfully flushed or enqueued for retryOpen Question
- Should we change
logger.flush()to enqueue and return immediately? This would allow us to makeflush()a synchronous operation in all cases.
iilei
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestin progressSomeone is actively working on this issueSomeone is actively working on this issue