LinkedIn
Copied!

Table of Contents

Creating a data flow

Version:

Only available versions of this content are shown in the dropdown

Create a data flow to process and move data between data sources. Customize your data flow by adding data flow shapes and by referencing other business rules to do more complex data operations. For example, a simple data flow can move data from a single data set, apply a filter, and save the results in a different data set. More complex data flows can be sourced by other data flows, can apply strategies for data processing, and open a case or trigger an activity as the final outcome of the data flow.

  1. In the header of Dev Studio, click Create Data Model Data Flow .

  2. In the Create Data Flow tab, create the rule that stores the data flow:

    1. In the header of Dev Studio, click Create Data Model Data Flow .

    2. On the Create form, enter values in the fields to define the context of the flow.

    3. In the Label field, describe the purpose of the data flow.

    4. Optional:

      To change the default identifier for the data flow, click Edit, enter a meaningful name, and then click OK.

    5. In the Apply to field, press the Down arrow key, and then select the class that defines the scope of the flow.

      The class controls which rules the data flow can use. It also controls which rules can call the data flow.
    6. In the Add to ruleset field, select the name and version of a ruleset that stores the data flow.

    7. Click Create and open.

  3. In the Edit Data flow tab, double-click the Source shape.

  4. In the Source configurations window, in the Source list, define a primary data source for the data flow by selecting one of the following options:

    • To receive data from an activity or from a data flow with a destination that refers to your data flow, select Abstract.
    • To receive data from a different data flow, select Data flow. Ensure that the data flow that you select has an abstract destination defined.
    • To receive data from a data set, select Data set. If you select a streaming data set, such as Kafka, Kinesis, or Stream, in the Read options section, define a read option for the data flow:
      • To read both real-time records and data records that are stored before the start of the data flow, select Read existing and new records.
      • To read only real real-time records, select Only read new records.

      For more information, see Data Set rule form - Completing Data Set tab.

    • To retrieve and sort information from the PegaRULES database, an external database, or an Elasticsearch index, select Report definition.
    Secondary sources appear in the Data Flow tab when you start combining and merging data. Secondary sources can originate from a data set, data flow, or report definition.
  5. In the Source configurations window, click Submit.

  6. Optional:

    To facilitate data processing, transform data that comes from the data source by performing one or more of the following procedures:

  7. Optional:

    To apply advanced data processing on data that comes from the data source, call other rule types from the data flow by performing one or more of the following procedures:

  8. In the Edit Data flow tab, double-click the Destination shape.

  9. In the Destination configurations window, in the Destination list, define the output point of the data flow by selecting one of the following options:

    • If you want other data flows to use your data flow as their source, select Abstract.
    • If you want an activity to use the output data from your data flow, select Activity.
    • If you want to start a case as the result of a completed data flow, select Case. The created case contains the output data from your data flow.
    • If you want to send output data to a different data flow, select Data flow. Ensure that the data flow that you select has an abstract source defined.
    • To save the output data into a data set, select Data set.
      Do not save data into Monte Carlo, Stream, or social media data sets.

      For more information, see Data Set rule form - Completing Data Set tab.

  10. In the Source configurations window, click Submit.

  11. In the Edit data flow tab, click Save.

  • Filtering incoming data

    Filter incoming data to reduce the number of records that your data flow needs to process. Specify filter conditions to make sure that you get the data that is applicable to your use case. Reducing the number of records that your data flow needs to process, decreases the processing time and hardware utilization.

  • Combining data from two sources

    Combine data from two sources into a page or page list to have all the necessary data in one record. To combine data, you need to identify a property that is a match between the two sources. The data from the secondary source is appended to the incoming data record as an embedded data page. When you use multiple Compose shapes, the incoming data is appended with multiple embedded data pages.

  • Converting the class of the incoming data pages

    You change the class of the incoming data pages to another class when you need to make the data available elsewhere. For example, you want to store data in a data set that is in a different class than your data flow and contains different names of properties than the source data set. You might also want to propagate only a part of the incoming data to a branched destination, like strategy results (without customer data) to the Interaction History data set.

  • Merging data

    Combine data from the primary and secondary paths into a single track to merge an incomplete record with a data record that comes from the secondary data source. After you merge data from two paths, the output records keeps only the unique data from both paths. The Merge shape outputs one or multiple records for every incoming data record depending on the number of records that match the merge condition.

  • Applying complex data transformations

    Reference Data Transform rules to apply complex data transformations on the top-level data page to modify the incoming data record. For example, when you have a flat data record that contains the Accound_ID and Customer_ID properties you can apply a data transform to construct an Account record that contains the Customer record as an embedded page.

  • Applying complex event processing

    Reference Event Strategy rules to apply complex event processing in your data flow. Build data flows to handle data records from real-time data sources. For example, you can use complex ecents processing to analyze and identify patterns in call detail records (CDR) or banking transactions.

  • Adding strategies to data flows

    Reference Strategy rules to apply predictive analytics, adaptive analytics, and other business rules when processing data in your data flow. Build data flows that can leverage strategies to identify the optimal action to take with customers to satisfy their expectations while also meeting business objectives. For example, based on the purchase history, you can prepare a sales offer that each individual customer is likely to accept.

  • Applying text analysis on the data records

    Reference Text Analyzer rules to apply text analysis in your data flow. Build data flows that can analyze text data to derive business information from it. For example, you can analyze the text that is posted on social media platforms like Facebook, and YouTube.

  • Branching a data flow

    You create multiple branches in a data flow to create independent paths for processing data in your application. By splitting your data flow into multiple paths, you can decrease the number of Data Flow rules that are required to process data from a single source.

  • Configuring a data flow to update a single property only

    You can update a single property as a result of a data flow run. By using the Cassandra architecture in Decision Data Store you can update or append values for individual properties, instead of updating the full data record each time that a single property value changes. This solution can improve system performance by decreasing the system resources that are required to update your data records.

  • Types of data flows

    Data flows are scalable data pipelines that you can build to sequence and combine data based on various data sources. Each data flow consists of components that transform data and enrich data processing with business rules.

  • Changing the number of retries for SAVE operations in batch and real-time data flow runs

    Control how many times batch and real-time data flow runs retry SAVE operations on records. With automatic retries, when a SAVE operation fails, the run can still successfully complete if the resources that were initially unavailable become operational. The run fails only when all the retries are unsuccessful.

  • Adding pre- and post- activities to data flows

    You can specify the activities that are executed before and after a data flow run. Use them to prepare your data flow run and perform certain actions when the run ends. Pre-activities run before assignments are created. Post-activities start at the end of the data flow regardless of whether the run finishes, fails, or stops. Both pre- and post-activities run only once and are associated with the data flow run.

  • Recording scorecard explanations through data flows

    Store a scorecard explanation for each calculation as part of strategy results by enabling scorecard explanations in a data flow. Scorecard explanations improve the transparency of your decisions and facilitate monitoring scorecards for compliance and regulatory purposes.

Related Content

Did you find this content helpful?

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.