You can create real-time runs for data flows that have a data set that can be streamed in real-time as the primary input, for example, of type Stream, or Facebook. Data flow runs that are initiated through the Data Flows landing page process data in the access group context. They always use the checked-in instance of the Data Flow rule and the rules that are referenced by that Data Flow rule.
Note:
Record failure
Fail the run after more than x failed records – Terminate the processing of the data flow and mark it as failed after the threshold for the allowed total number of failed records is reached or exceeded. If the threshold is not reached or exceeded, the data flow run finishes with errors. The default value is 1000 failed records.
Node failure
Resume on other nodes from the last snapshot – For resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the last processed record ID before the snapshot with the data flow run was saved. With this setting enabled, each record can be processed more than once.
Restart the partitions on other nodes – For non-resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the first record in the data partition. With this setting enabled, each record can be processed more than once.
Fail the entire run – Terminate the data flow run and mark it as failed when a Data Flow service node fails. This setting provides backward compatibility with previous Pega Platform versions.
Snapshot management
Create a snapshot every x seconds – For resumable data flow runs, specify the elapsed time for creating snapshots of the data flow runs state. The default value is 5 seconds.
Enter the number of events between the consecutive event strategy store operations.