The Data Flow service enables running data flow work items on decision data. To enable the Data Flow service, you must add at least one decision data node on the Data Flow tab of the Services landing page. You can add additional decision data nodes to scale the processing of data. Each new data flow work item is processed on all nodes that are listed in this tab.
Test runs of data flows are always processed on the local Pega Platform node, so you do not configure the Data Flow service for test runs.
Add a node.
Specify the number of Pega Platform threads that are assigned to process running the data flows and the batch scalability factor to use idle threads for running the data flows.
For example, when the source of a data flow is divided into five partitions, the data flow run is divided into five assignments that can be processed simultaneously on separate threads if there are enough threads.
The number of available threads is calculated by multiplying the thread count by the number of nodes. With two nodes and five threads in the system, the data flow run uses five threads and five threads remain idle. After you set the batch scalability factor to two, all 10 threads are used to process five assignments.
Enter the number of threads.
The number of threads for running data flows is the same across all decision data nodes that are configured for the Data Flow service.
Enter the batch scalability factor.
If you decommission a node that has active data flow runs, the status of that node changes to LEAVING, and it is not decommissioned until all active data flow runs are finished.