Data flows are scalable and resilient data pipelines that you can use to ingest, process, and move data from one or more sources to one or more destinations. Each data flow consists of components that transform data in the pipeline and enrich data processing with event strategies, strategies, and text analysis. The components run concurrently to handle data starting from the source and ending at the destination.
Data Flow definition
You can create a data flow instance from the Data Model category. Select its source, add components, and select its destination. The data flow components that you can use depend on the type of data flow that you want to build. A simple data flow can move data from a single source dataset, apply a filter, and save the results in a single destination data set. More complex data flows can be sourced by other data flows, and can compose the source data with secondary data sources, apply strategies for data processing, and open a case or trigger an activity.
A Data Flow that calls a Strategy to determine the next best action and writes results to the Strategy Result class
For more information, see Data flow patterns.
To run a data flow on all service nodes, you can configure the Data Flow service from the Services landing page. Add more nodes to the Data Flow service to scale data processing. A data flow that is run through the Data Flows landing page uses the checked-in instance of the data flow and the referenced rules.
Configuring the Data Flow Service
You can run a data flow on the current node and in the context of the operator by clicking Actions > Run in the rule form. The operator context includes checked-out rules. You can run the data flow from the rule form to test your local changes to the data flow.
For more information, see Configuring the Data Flow service.
Data Flow management
On the Data Flows landing page, you can run and manage batch, real-time, single case and external data flows.
An active Data Flow on the Data Flows landing page
For more information, see Data Flows landing page.
Advanced features of data flows
You can build advanced use cases, use data flows programmatically, and train your adaptive models by running data flows.
For more information, see Configuring advanced features of data flows.
API methods for running Data Flows
Apart from using the standard UI-driven process of running and managing data flows, you can configure certain operations to run automatically through API methods. When you know how to create and configure Pega Platform activities, you can run data flows programmatically by using the DataFlow-Execute method. For example, you can configure an activity to start a data flow at a specified time.