Through an external data flow (EDF), you can sequence and combine data based on an HDFS data set and write the results to a destination. The sequence is established through a set of instructions and execution points from source to destination. Between the source and destination of an external data flow, you can apply predictive model execution, merge, convert, and filter instructions.
You can use the following shapes to define the pattern of the external data flow:
Source is the standard entry point of a data flow. A source defines data that you read through the data flow. For EDF rules, the entry point is based on the data defined in a data set in the data flow class.
You can select only HDFS data sets that use either CSV or Parquet files for data storage as the source for an EDF.
With the merge shape, you can combine data in the primary and secondary data paths resulting in the same class into a single track. For EDF, the Merge shape has two inputs and one output. In this shape, you can define a single join condition based on two properties (each defined on the same class as the input paths).
In cases of data mismatch, you can select the source that takes precedence:
This shape references the predictive model rule that you want to apply on data. In this shape, you can reference a predictive model rule and mappings between the predictive model output and the Pega Platform properties. The properties must be defined in the same class as the input data for the Predictive model shape. The inheritance constraint is not applicable to the predictive model rule.
Through this shape, you can convert data from one class into another class. The mapping of properties between source and target can be handled automatically, where the properties with identical names are automatically copied to the target class. You can also manually assign properties to the target class. If both auto-mapping and manual mapping are used, then the manual mapping takes the precedence.
The filter shape defines the filter conditions and applies them to each element of the input flow. The output flow consists of only the elements that satisfy the filter conditions. Each condition is built from the following objects:
This shape specifies the destination for the data retrieved as a result of running an external data flow. You can configure the destination type and refer to the destination object. An external data flow can have multiple destinations.
You can select only HDFS data sets that use either CSV or Parquet files for data storage as the destination of an EDF.