- Data flow source
- In a resumable data flow run, the source of the referenced data Flow is a Stream, Kafka, or Database Table data set. The remaining data sets can be part of non-resumable data flow runs only.
- Data flow resumption
- Resumable runs can be paused or resumed and, in the case of node failure, the active data partitions will be transferred to the remaining functional nodes and resumed from the last correctly processed record ID that was captured as a snapshot. For non-resumable runs, no snapshots are taken because the order of the incoming records cannot be ensured. Therefore, the starting point for non-resumable data flow runs is the first record in each partition.
You can configure the following resilience settings:
- Record failure:
- Fail the run after more than x failed records – Terminate the processing of the data flow and mark it as failed after the threshold for the allowed total number of failed records is reached or exceeded. If the threshold is not reached or exceeded, the data flow run finishes with errors. The default value is 1000 failed records.
- Node failure:
- Resume on other nodes from the last snapshot – For resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the last processed record ID before the snapshot with the data flow run was saved. With this setting enabled, each record can be processed more than once.
- Restart the partitions on other nodes – For non-resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the first record in the data partition. With this setting enabled, each record can be processed more than once.
- Skip partitions on the failed node – For batch mode data flow runs, do not analyze the data that resides on the failed Data Flow service node. The run will be completed without all records being processed but each record that is successfully processed as a result of this data flow run is processed only once.
- Fail the entire run – Terminate the data flow run and mark it as failed when a Data Flow service node fails. This setting provides backward compatibility with previous Pega Platform versions.
- Snapshot management:
- Create a snapshot every x seconds – For resumable data flow runs, specify the elapsed time for creating snapshots of the data flow runs state. The default value is 5 seconds.