cdk-stepfunctions-patterns library is a set of AWS CDK constructs that provide resiliency patterns implementation for AWS Step Functions.
All these patterns are composable, meaning that you can combine them together to create quite complex state machines that are much easier to maintain and support than low-level JSON definitions.
- Try / Catch
- Try / Finally
- Try / Catch / Finally
- Retry with backoff and jitter
- Resilience lambda errors handling
- Validation of proper resilience lambda errors handling
Step Functions support Try / Catch pattern natively with Task and Parallel states.
TryTask
construct adds a high level abstraction that allows you to use Try / Catch pattern with any state or sequence of states.
import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';
// ...
new sfn.StateMachine(this, 'TryCatchStepMachine', {
definition: new TryTask(this, "TryCatch", {
tryProcess: new sfn.Pass(this, 'A1').next(new sfn.Pass(this, 'B1')),
catchProcess: new sfn.Pass(this, 'catchHandler'),
// optional configuration properties
catchProps: {
errors: ['Lambda.AWSLambdaException'],
resultPath: "$.ErrorDetails"
}
})
})
It is often useful to design state machine using Try / Finally pattern. The idea is to have a Final state that has to be executed regardless of successful or failed execution of the Try state. There may be some temporal resource you want to delete or notification to send.
Step Functions do not provide a native way to implement that pattern but it can be done using Parallel state and catch all catch specification.
TryTask
construct abstracts these implementation details and allows to express the pattern directly.
import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';
// ...
new sfn.StateMachine(this, 'TryFinallyStepMachine', {
definition: new TryTask(this, "TryFinally", {
tryProcess: new sfn.Pass(this, 'A2').next(new sfn.Pass(this, 'B2')),
finallyProcess: new sfn.Pass(this, 'finallyHandler'),
// optional configuration properties
finallyErrorPath: "$.FinallyErrorDetails"
})
})
This is a combination of two previous patterns. TryTask
construct allows you to express rather complex
error handling logic in a very compact form.
import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';
// ...
new sfn.StateMachine(this, 'TryCatchFinallyStepMachine', {
definition: new TryTask(this, "TryCatchFinalli", {
tryProcess: new sfn.Pass(this, 'A3').next(new sfn.Pass(this, 'B3')),
catchProcess: new sfn.Pass(this, 'catchHandler3'),
finallyProcess: new sfn.Pass(this, 'finallyHandler3')
})
})
Out of the box Step Functions retry implementation provides a way to configure backoff factor, but there is no built in way to introduce jitter. As covered in Exponential Backoff And Jitter and Wait and Retry with Jittered Back-off this retry technique can be very helpful in high-load scenarios.
RetryWithJitterTask
construct provides a custom implementation of retry with backoff and
jitter that you can use directly in your state machines.
import * as sfn from '@aws-cdk/aws-stepfunctions';
import { RetryWithJitterTask } from 'cdk-stepfunctions-patterns';
// ...
new sfn.StateMachine(this, 'RetryWithJitterStepMachine', {
definition: new RetryWithJitterTask(this, "AWithJitter", {
tryProcess: new sfn.Pass(this, 'A4').next(new sfn.Pass(this, 'B4')),
retryProps: { errors: ["States.ALL"], maxAttempts: 3 }
})
})
LambdaInvoke
construct from aws-stepfunctions-tasks
module is probably one of the most used ones. Still, handling of
AWS Lambda service exceptions
is often overlooked.
ResilientLambdaTask
is a drop-in replacement construct for LambdaInvoke
that adds retry for the most common
transient errors:
- Lambda.ServiceException
- Lambda.AWSLambdaException
- Lambda.SdkClientException
- Lambda.TooManyRequestsException
import * as lambda from '@aws-cdk/aws-lambda';
import { ResilientLambdaTask } from 'cdk-stepfunctions-patterns';
// ...
const lambdaFunction = new lambda.Function(this, 'LambdaFunction', {
// ... removed for clarity
});
const calculateJitterTask = new ResilientLambdaTask(this, "InvokeLambda", {
lambdaFunction: lambdaFunction
})
That would result in the following state definition:
"InvokeLambda": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "<ARN of lambda function>"
},
"Retry": [{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}]
}
It is often a challenge to enforce consistent transient error handling across all state machines of a large application. To help with that, cdk-stepfuctions-patterns provides a CDK aspect to verify that all Lambda invocations correctly handle transient errors from AWS Lambda service.
Use ResilienceLambdaChecker
aspect as shown below.
import * as cdk from '@aws-cdk/core';
import { ResilienceLambdaChecker } from 'cdk-stepfunctions-patterns'
const app = new cdk.App();
// ...
// validate compliance rules
app.node.applyAspect(new ResilienceLambdaChecker());
If there are some states in your application that do not retry transient errors or miss some recommended error codes, there will be warning during CDK synthesize stage:
PS C:\Dev\GitHub\cdk-stepfunctions-patterns> cdk synth --strict
[Warning at /StepFunctionsPatterns/A] No retry for AWS Lambda transient errors defined - consider using ResilientLambdaTask construct.
[Warning at /StepFunctionsPatterns/B] Missing retry for transient errors: Lambda.AWSLambdaException,Lambda.SdkClientException.