September 22, 2021

Thedevopsblog

DevOps, AWS, Azure, GCP, IaC

Integrating AWS Step Functions callbacks and external systems

This post is written by Zach Abrahamson, Sr. Cloud App Architect, Shared Delivery Teams.

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, you can design and run workflows that integrate services such as AWS Lambda and Amazon ECS into feature-rich applications.

Workflows are composed of a series of steps, with the output of one step acting as input into the next. Step Functions automatically triggers and tracks each step, and retries when there are errors, providing resilience to your workflows.

Sometimes customers want to take advantage of the intuitive nature of Step Functions while integrating with external systems either synchronously or asynchronously. The callback task integration pattern enables Step Functions workflows to send a token to an external system via Amazon SQS, AWS Lambda, Amazon ECS, and other AWS services.

If the target external system is also running on AWS, the SQS and Amazon SNS integrations enable forwarding of the token to that system for processing. However, these services may be in other AWS accounts or hosted on another platform. In this case, you can use a Lambda function to broker the call to that service.

This blog explores how to integrate Step Functions workflows with external systems using Lambda, DynamoDB, and API Gateway. It introduces a layer of abstraction using these services that keeps the external system decoupled from the Step Functions implementation.

The external system calls back to an API with a unique ID or tuple stored as a task token. This is referenced against a known value in the DynamoDB table and then starts the Step Functions workflow.

Overview

This architecture includes two Lambda functions. The first handles the ID generation and call to the external service. The second handles the callback to API Gateway, the DynamoDB token lookup, and the call to Step Functions API.

Solution architecture

The first function is defined as a workflow step that can receive the Step Functions task token and payload to be forwarded to the external service. Once this step finishes and the external service handles the request, the workflow branch pauses and waits for a callback.

The unique ID may be a value that’s business case-specific in the Step Functions payload such as an order ID or the task token. When the external service completes its task, it can call back to the API Gateway endpoint with this ID, the task status, and any output:

{
  "order_id": "a96cbeed-cbd7-4711-815c-0913113c6064",
  "task_type": "ORDER_SHIPPING_SERVICE",
  "task_status": "SUCCESS",
  "task_output": {
    "shipping_status": "PROCESSING",
    "tracking_number": "1ZA1B2C3D4"
  }
}

The handler uses this ID to look up the corresponding Step Functions task token in the DynamoDB table and call the Step Functions API SendTaskSuccess action.

The external system does not need to have any dependency on Step Functions or any AWS service API. It only handles a payload and calls a REST API with the result. If you migrate the backend service from Step Functions to a custom solution based on EKS, you can still use the same external service and endpoints to abstract the two systems. The external service does not need to change anything.

You can deploy the code by following the README instructions from this GitHub repo. This section provides a walkthrough of the deployed resources.

The Step Functions workflow

{
    "StartAt": "ProcessOrder",
    "States": {
        "GetOrderMetadata": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-2:111111111111:function:GetOrderMetadataHandler-HE5LHL6WQQ3Y",
            "Output": "process_order_output",
            "Next": "ShippingServiceCallback"
        },
        "ShippingServiceCallback": {
            "Type": "Task",
            "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
            "Parameters": {
                "FunctionName": "arn:aws:lambda:us-east-2:111111111111:function:SNSServiceCallbackHandler-2ZB9WVQQNWO2",
                "Payload": {
                    "token.$": "$$.Task.Token",
                    "input.$": "$",
                    "callback": "true"
                }
            },
            "Output": "shipping_service_output",
            "Next": "ProcessResults"
        },
        "ProcessShippingResults": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-2:111111111111:function:ProcessResultsHandler-EP1222ZM1KCR",
            "Output": "process_results_output",
            "End": true
        }
    }
}

This is a simplified example of an order service workflow. It starts with a ProcessOrder function, which reads order information from a repository and persists it into the event for subsequent steps to handle. This uses business logic to tie back to the workflow in the next step that calls the external system. The example assumes that this system already exists.

The second step is a callback step for an external service, which in this example is a shipping service. The shipping service returns a tracking number in shipping_service_output for downstream workflow steps. A Lambda function sends a message to an SNS topic for the external system to subscribe to. This Lambda function also persists this token in the DynamoDB table for processing the callback.

In this example, the final step prints the output of the shipping service that is persisted in the event after the callback.

Orders DynamoDB table

  OrdersTable:
    Type: "AWS::DynamoDB::Table"
    Properties:
      TableName: "Orders"
      AttributeDefinitions:
        - AttributeName: "order_id"
          AttributeType: "S"
      KeySchema:
        - AttributeName: "order_id"
          KeyType: "HASH"

CallbackTasks DynamoDB Table

  CallbackTasksTable:
    Type: "AWS::DynamoDB::Table"
    Properties:
      TableName: "CallbackTasks"
      AttributeDefinitions:
        - AttributeName: "order_id"
          AttributeType: "S"
        - AttributeName: "task_type"
          AttributeType: "S"
      KeySchema:
        - AttributeName: "order_id"
          KeyType: "HASH"
        - AttributeName: "task_type"
          KeyType: "RANGE"

This table persists the Step Functions task token with metadata about the task. It stores the business-specific logic as key attributes for the items. This helps translate an order ID and task type to a callback token to resume the Step Functions workflow.

API Gateway API

The API definition has a POST handler for the callback handler. This API specification defines types for the request that use enumerated types and required fields. This removes the need for the handler to do any basic data validation.

Lambda handler exceptions are used to determine the response code. For example, an exception with “Internal Error” in the message automatically returns the exception message in the body and an HTTP 500 status code.

This endpoint handles a request with the following payloads:

{
  "order_id": "a96cbeed-cbd7-4711-815c-0913113c6064",
  "task_type": "ORDER_SHIPPING_SERVICE",
  "task_status": "SUCCESS",
  "task_output": {
    "shipping_status": "PROCESSING",
    "tracking_number": "1ZA1B2C3D4"
  }
}

The GetOrderMetadataFunction Lambda function queries the OrderTable DynamoDB table and persists the order contents to the event. It is the first function called in the Step Functions workflow and uses the ‘order_id’ value in the workflow input.

The SNSCallbackFunction Lambda function is a generic callback task handler that sends a payload from the event to an SNS topic. It first brings the callback task token into the top level of the event. It uses environment variables in the Lambda function configuration to determine the payload and topic:

  • SNS_TOPIC_ARN: ARN of the SNS topic to publish to.
  • CALLBACK_TABLE: The DynamoDB table to use for callback token persistence.
  • TASK_TYPE: Task type to persist in DynamoDB for the eventual callback lookup.
  • PAYLOAD_EVENT_KEY: Key in the event to publish to the SNS topic.

The ExternalCallbackHandler Lambda function uses the ‘CALLBACK_TABLE’ environment variable to determine which DynamoDB table to use for token lookups. This function backs the externalCallback API Gateway endpoint and receives an ExternalCallbackTaskStatusRequest object as the event. The fields are parsed and used to persist the shipping service data into the Step Functions workflow event.

The ProcessShippingResultFunction Lambda function shows how the payload from the external service is available in the event for downstream steps. It persists the tracking number from the shipping service in the event to the Orders table.

Walking through the workflow

  1. There is an order in the Orders table with the following attributes:
    Edit item UI
  2. A new workflow is started with the following input:
    New workflow UI
  3. The workflow pauses at the Shipping Service Callback step with the persisted order contents in the event:
    Visual workflow UI
  4. The order message is published to the SNS topic:
    SNS notification
  5. The task is persisted in the CallbackTasks table with the task token:
    Edit item UI
  6. Call the externalCallback API Gateway endpoint with this example payload:
    Request body
  7. The workflow has completed:
    Completed workflow UI
  8. The tracking number has been persisted into the Orders table:
    Orders table UI

Conclusion

This article presents an architecture that enables customers to use Step Functions workflow callbacks without exposing the implementation and AWS service details to external systems.

The example shipping service subscribes to an SNS topic and calls an API Gateway endpoint with the shipping information to resume the workflow and persist the tracking information.

This pattern is useful for workflows that must interact synchronously with systems external to AWS or cannot communicate with AWS services.

For more serverless learning resources, visit Serverless Land.