Creating external data flows
External Data Flow (EDF) is a rule for defining the flow of data on the graphical canvas and executing that flow on an external system. With EDF, you can run predictive analytics models in a Hadoop environment and utilize its infrastructure to process large numbers of records to limit the data transfer between Hadoop and the Pega Platform.
- External Data Flow rules - Completing the Create, Save As, or Specialization form
- Data flow tab on the External Data Flow form
Through an external data flow (EDF), you can sequence and combine data based on an HDFS data set and write the results to a destination. The sequence is established through a set of instructions and execution points from source to destination. Between the source and destination of an external data flow, you can apply predictive model execution, merge, convert, and filter instructions.
- Configuring YARN settings
Configure the YARN Resource Manager settings to enable running external data flows (EDFs) on a Hadoop record. When an external data flow is started from Pega Platform, it triggers a YARN application directly on the Hadoop record for data processing.
- Configuring run-time settings
You can apply additional JAR file resources to the Hadoop record as part of running an external data flow. When you reference a JAR resource file in the Runtime configuration section, the JAR file is sent to the working directory of the Hadoop record as part of the class path each time you run an external data flow. After an external data flow finishes, the referenced resources are removed from the Hadoop.