September 23, 2021

Thedevopsblog

DevOps, AWS, Azure, GCP, IaC

Modelling workflow input and output path processing with data flow simulator

AWS Step Functions recently introduced a new data flow simulator to model input and output path processing. This new feature makes it easier to evaluate JSON-based input and output data as it passes through a state, helping to build workflows faster.

Developers build Step Functions workflows to orchestrate multiple services into business-critical applications with minimal code. Each state within a workflow receives a JSON input and passes a JSON output to the next state. Customers wanted a better way to understand the manipulation of this JSON input and output as it passes through a state.

This blog post explains Step Functions input and output processing. It shows how to model data using the new data flow simulator and introduces best practices for working with paths in AWS Step Functions.

Overview

The new Step Functions Data Flow Simulator allows developers to simulate the order of data processing that occurs in a single Task state during execution. This helps developers understand how to filter and manipulate data as it flows from state to state. Developers can now specify a starting JSON input, and evaluate it through each of the processing path stages.

Amazon States Language (ASL) enables developers to filter and manipulate data at various stages of a workflow state’s execution using paths. A path is a string beginning with $ that lets you identify and filter subsets of JSON text. Learning how to apply these filters helps to build efficient workflows with minimal state transitions.

The following ASL fields can be applied to filter and control the flow of JSON from state to state:

InputPath
Selects which parts of the JSON input to pass to the Task state. Step Functions applies the InputPath field first and then the Parameters field.

Parameters
Creates a collection of key-value pairs that are passed as input to an AWS service integration, such as an AWS Lambda function. These values can be static, or dynamically selected from either the state input or the workflow context object.

ResultSelector

Provides a way to manipulate the state’s result before the ResultPath is applied. Similar to the parameters field, it allows you to create a collection of key-value pairs. The output of ResultSelector replaces the state’s result and is passed to ResultPath.

ResultPath

Specifies the output of a task before it is passed along to the OutputPath. Use the ResultPath to determine whether the output of a state is a copy of its input, the results it produces, or a combination of both.

OutputPath

Filters the JSON provided by the TaskResult to limit the information that’s passed to the task’s final output.

The following diagram shows the order in which these paths and filters are applied as information moves through a task state:

Getting started

To access the simulator, choose Data Flow Simulator (1) from the navigation bar in the AWS Step Functions console.

The simulator starts at the state input stage. Here you can edit or paste your own JSON data into the state input field or use the pre-populated example to get started quickly. The input field automatically validates the JSON object, and highlights any syntax errors.

Choose one of the icons (2) at the top of the simulator to jump to a specific evaluation stage. Or choose the Next button (3) at the bottom of the simulator to step through each stage sequentially.

Evaluating JSON paths with the Data Flow Simulator

Use the simulator to evaluate JSON path values for InputPath, Resultpath, and OutputPath.

In the following example, I keep the sample state input unedited and choose the InputPath stage. I set the InputPath to $.library. The left panel shows the state input before the InputPath is applied. The panel on the right shows the state input after InputPath is applied. I can try different InputPath values to check immediately if they are valid and see what they evaluate to:

Best practices for building workflows with Data Flow Simulator

Developers can use the simulator to help test and debug their JSON processing paths and adhere to the following best practices while building their Step Functions workflows:

  1. Manage payload size with OutputPath and ResultPath
    AWS Step Functions supports payload sizes up to 256 KB. Often it is not necessary to pass the entire payload result of one task state the next. Use the input and output processing paths along with the Data Flow simulator to filter and transform task payloads. This helps to reduce payload size and reduces complexity and custom code in downstream services.The following example shows how OutputPath is applied to a task result to modify and reduce the payload size:

    { “OutputPath”:”$.TaskResult.modifiedPayload”}

    Sometimes there are occasions where increasing the size of the payload is useful. Use the ResultPath to store a Task result in a nested field instead of losing or overwriting the input. The Step Functions workflow can reference this later and use as the input for other states:

  2. Reduce custom code by using JSONPath expressions with the InputPath field.
    Step Functions supports a number of JSONPath expressions to help to transform and manipulate JSON data. Where possible, use these expressions to filter selectively and transform JSON input. Some examples are shown given the following input:

    { "store": {
        "book": [ 
          { "category": "reference",
            "author": "Nigel Rees",
            "title": "Sayings of the Century",
            "price": 8.95
          },
          { "category": "fiction",
            "author": "Evelyn Waugh",
            "title": "Sword of Honour",
            "price": 12.99
          },
          { "category": "fiction",
            "author": "Herman Melville",
            "title": "Moby Dick",
            "isbn": "0-553-21311-3",
            "price": 8.99
          },
          { "category": "fiction",
            "author": "J. R. R. Tolkien",
            "title": "The Lord of the Rings",
            "isbn": "0-395-19395-8",
            "price": 22.99
          }
        ],
        "bicycle": {
          "color": "red",
          "price": 19.95
        }
      }
    }
    $..book[?(@.isbn)] // filter all books with isbn number 
    $..book[2] //the third book 
    $.store.book[*].author // the authors of all books in the store 
    $..book[?(@category==’function’ && @.price < 10)] // filter all books cheaper than 10 and category is fiction
    
  3. Reduce state transitions with intrinsic functions
    Intrinsics are ASL constructs that help build and convert payloads data types without creating additional Task state transitions. They can be used in Task states within the ResultSelector field, or in a Pass state in either the Parameters or ResultSelector field. The Step Functions documentation shows examples of how to:

    • Construct strings from interpolated values
    • Convert a JSON object to a string
    • Convert arguments to an array
  4. Pass original input to error handlers with Resultpath and Catch objectTask, map, and parallel states can have a field named Catch. This field allows you to handle errors by transitioning to a specific workflow state. Use the Resultpath to store the original input for the error handling state:
    "Catch": [{ 
      "ErrorEquals": ["States.ALL"], 
      "Next": "HandleError", 
      "ResultPath": "$.error" 
    }]

Conclusion

The Step Functions Data Flow Simulator is a new AWS Management Console feature. It allows developers to simulate the order of input and output data processing that occurs in Task state during execution. Developers can use it to evaluate JSONPath expressions quickly for each path processing stage. This helps developers to build faster and apply best practices to their Step Functions workflows.

The data flow simulator is generally available in the following Regions: US East (Ohio and N. Virginia), EU West (Ireland), and Canada (Central). It will be generally available in all other commercial Regions where Step Functions is available in the coming days. For a complete list of Regions and service offerings, see AWS Regions.

For more information on building applications with Step Functions visit serverlessland.com.