Step Functions are often used to model complex workflows like this, and complex workflows naturally have complex failure modes. Many state machines in the wild can be pretty complex, take this one from CocaCola as an example! In short, there are a lot of ways your state machines can fail and you should aim to catch as many of them with automated tests as possible. This includes a whole class of potential failure modes including bugs in your business logic, misconfigured IAM permissions, or any number of program errors that can fail the Lambda invocation (null reference exception, missing dependencies, etc.) And you might have bugs in the Lambda function itself. Many Task states would invoke a Lambda function to exercise some custom business logic. Ultimately, the message is not delivered to the right place, and the app fails. Maybe the topic ARN is not configured correctly, or maybe you’re sending the wrong message, or perhaps you forgot to set the message attribute the subscriber needs to filter the incoming messages. When integrating with these services, sometimes you have to provide a Parameters object and include things like the ARN for a SNS topic or the URL for a SQS queue. Step Functions supports a number of service integrations out-of-the-box, including Lambda, SNS, SQS and DynamoDB. Similarly, if you have Task states that reads data from a DynamoDB then the state machine’s IAM role must have the necessary DynamoDB permissions too. If you have a Task state that executes a Lambda function then the state machine must have the lambda:InvokeFunction permission for this function. The state machine requires IAM permissions to interact with other AWS services. Whatever the case, your state machine definition might be syntactically correct but it would not perform as you’d expect. For example, the “next” state is misconfigured and pointing to the wrong state, or perhaps you forgot to override the default 30s timeout for a Task state that often runs for a few minutes, or maybe you used the wrong input or output path for a Task state. There can be mistakes in the state machine definition itself. Here are the failure modes that you should test for. Task states can integrate directly with a number of AWS services such as Lambda functions, DynamoDB tables, SNS topics, and SQS queues.įrom the testing perspective, we need to understand where things can go wrong and hope to cover as many of the possible failure modes as possible. The Task states are the meat of a state machine as they are the places where the state machine calls out to another resource to perform some computation or data operation. You define the state machine using a JSON-based language, but you can also design the state machine visually using the new visual designer ( announced on Jun 17th 2021). You can include branching logic, retry failed tasks, perform tasks in parallel, and even call out to another state machine and wait for its response. a series of steps of tasks that need to be performed. In Step Functions, you model workflows as state machines, i.e. In this post, let’s talk about the challenges with testing Step Functions, the failure modes that you should test for, and my strategy for testing Step Functions. However, a common challenge newcomers struggle with Step Functions is how do you go about testing them? It’s one of my favorite AWS services and I often use it to model complex or business-critical workflows. It packs a great visualization tool (which you can also use to design your workflows visually now!) and can integrate with many AWS services directly, including Lambda, DynamoDB, and API Gateway. AWS Step Functions is a powerful orchestration service that lets you model even the most complex business workflows.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |