Creating a real-time run for data flows
You can create real-time runs for data flows that have a data set that can be streamed in real-time as the primary input, for example, of type Stream, or Facebook. Data flow runs that are initiated through the Data Flows landing page process data in the access group context. They always use the checked-in instance of the Data Flow rule and the rules that are referenced by that Data Flow rule.
Note: If a read operation fails due to service unavailability, for
      example, because of a network issue, the system retries the failed operation up to five times.
      You can change the number of retries by editing the
        dataflow/batch/source/maxRetries Dynamic System Setting. For more
      information, see Editing a Dynamic System Setting.
  - In the header of Dev Studio, click Configure > Decisioning > Decisions > Data Flows > Real-time processing.
- On the Real-time processing tab, click New.
- 
        Associate a Data Flow rule with the data flow run:
        
          - In the Applies to field, press the Down Arrow key and select the class that the Data Flow rule that you want to run applies to.
- In the Access group field, press the Down Arrow key and select an access group context for the data flow run.
- In the Data flow field, press the Down Arrow key and select a Data Flow rule that you want to run. The available rules are limited by the selection of the Applies To class. Additionally, you can only select Data Flow rules whose source is a data set of type Stream.
- In the Service instance name field, select Real Time.
 
- Optional: 
        
          To keep the run active and restarted automatically after every modification, select the
          Manage the run and include it in the application
          check box and
          then associate it with a ruleset.
        
        If you move the ruleset between environments, the application will move the run with the ruleset to the new environment and keep it active.
- Optional: 
        Specify any activities that you want to run before the data flow starts or after the
          data flow run has completed.
        - Expand the Advanced section.
- 
            
              In the
              Additional processing
              section, perform the following
              actions:
            
            
              - Specify a preprocessing activity that you want to run before running the data flow.
- Specify a postprocessing activity that you want to run after running the data flow.
 
 
- Optional: 
        
          Specify the data flow run resilience settings for resumable or non-resumable data flow
          runs in the
          Resilience
          section:
        
        
          - Data flow source
- In a resumable data flow run, the source of the referenced data Flow is a Stream, Kafka, or Database Table data set. The remaining data sets can be part of non-resumable data flow runs only.
- Data flow resumption
- Resumable runs can be paused or resumed and, in the case of node failure, the active data partitions will be transferred to the remaining functional nodes and resumed from the last correctly processed record ID that was captured as a snapshot. For non-resumable runs, no snapshots are taken because the order of the incoming records cannot be ensured. Therefore, the starting point for non-resumable data flow runs is the first record in each partition.
 You can configure the following resilience settings: - 
              Record failure
              :
              - Fail the run after more than x failed records – Terminate the processing of the data flow and mark it as failed after the threshold for the allowed total number of failed records is reached or exceeded. If the threshold is not reached or exceeded, the data flow run finishes with errors. The default value is 1000 failed records.
 
- 
              Node failure
              :
              - Resume on other nodes from the last snapshot – For resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the last processed record ID before the snapshot with the data flow run was saved. With this setting enabled, each record can be processed more than once.
- Restart the partitions on other nodes – For non-resumable data flow runs, transfer the processing to the remaining active Data Flow service nodes. The starting point is based on the first record in the data partition. With this setting enabled, each record can be processed more than once.
- Fail the entire run – Terminate the data flow run and mark it as failed when a Data Flow service node fails. This setting provides backward compatibility with previous Pega Platform versions.
 
- 
              Snapshot management
              :
              - Create a snapshot every x seconds – For resumable data flow runs, specify the elapsed time for creating snapshots of the data flow runs state. The default value is 5 seconds.
 
 
- Optional: 
        For Data Flow rules that reference an Event Strategy rule, configure the state
          management settings.
        - Expand the Event strategy section.
- Optional: Modify the Event emitting option. By default, when the data flow run stops, all the incomplete Tumbling windows in the Event Strategy rule emit the events that they have collected.
- 
            
              In the
              State management
              section, specify the persistence
              type:
            
            
              - Memory - This persistence type keeps the event strategy state in running memory and writes the output to a destination when the data flow finishes running. The data is processed faster, but it can be lost if a system failure occurs.
- Database - This persistence type periodically replicates the state of an event strategy to the Cassandra database that is located in the Decision Data Store and stores it in the form of key values. When you select this type of data persistence, if a system failure occurs, you can fully restore the state of the event strategy and continue processing data.
 
- In the Target cache size field, specify the maximum size of the cache for the state management data. The default value is 10 megabytes.
 
- 
        Click Done.
        Data flow nodes are required to start the run. If the service contains no nodes in the cluster, a message is displayed with a link to the Services landing page, where you can add nodes.
- Optional: 
        To analyze the lifecycle of the run and troubleshoot potential issues, in the
            Run details tab of the data flow run, click View
            Lifecycle Events.
        In the window that opens, each event has a list of details, for example, reason, which you can analyze to better understand the event or debug an issue. For more information, see Event details in data flow runs on Pega Community.Note: By default, events from the last 10 days are displayed. You can change this value by editing the dataflow/run/lifecycleEventsRetentionDays dynamic data setting.You can export the events to a single file from the Actions list. For more information about events, see Event details in data flow runs.