Data flow tab on the External Data Flow form
Through an external data flow (EDF), you can sequence and combine data based on an HDFS data set and write the results to a destination. The sequence is established through a set of instructions and execution points from source to destination. Between the source and destination of an external data flow, you can apply predictive model execution, merge, convert, and filter instructions.
External data flow shapes
You can use the following shapes to define the pattern of the external data flow:
Source is the standard entry point of a data flow. A source defines data that you read through the data flow. For EDF rules, the entry point is based on the data defined in a data set in the data flow class.
With the merge shape, you can combine data in the primary and secondary data paths resulting in the same class into a single track. For EDF, the Merge shape has two inputs and one output. In this shape, you can define a single join condition based on two properties (each defined on the same class as the input paths).
In cases of data mismatch, you can select the source that takes precedence:
- Primary path- If properties have the same name but with different values, the property value from the primary source takes precedence.
- Secondary path - If properties have the same name but with different values, the property value from the secondary source takes precedence.
This shape references the predictive model rule that you want to apply on data. In this shape, you can reference a predictive model rule and mappings between the predictive model output and the Pega Platform properties. The properties must be defined in the same class as the input data for the Predictive model shape. The inheritance constraint is not applicable to the predictive model rule.
Through this shape, you can convert data from one class into another class. The mapping of properties between source and target can be handled automatically, where the properties with identical names are automatically copied to the target class. You can also manually assign properties to the target class. If both auto-mapping and manual mapping are used, then the manual mapping takes the precedence.
The filter shape defines the filter conditions and applies them to each element of the input flow. The output flow consists of only the elements that satisfy the filter conditions. Each condition is built from the following objects:
- Arguments - Can be either properties defined in the same class as the input data or constants (for example, strings or numbers).
- Operators - Specify how filter criteria relate to one another. You can use the following filter operators:
- equals "="
- not equal to "!="
- greater than ">"
- greater than or equal to ">="
- less "<"
- less than or equal to"<="
This shape specifies the destination for the data retrieved as a result of running an external data flow. You can configure the destination type and refer to the destination object. An external data flow can have multiple destinations.
- Creating external data flows
External Data Flow (EDF) is a rule for defining the flow of data on the graphical canvas and executing that flow on an external system. With EDF, you can run predictive analytics models in a Hadoop environment and utilize its infrastructure to process large numbers of records to limit the data transfer between Hadoop and the Pega Platform.