Creating an external data flow run
You can specify where to run external data flows and manage and monitor running them on the External processing tab of the Data Flows landing page. External data flows run in an external environment (data set) that is referenced by a Hadoop record on the Pega Platform platform.
Before you can create an external data flow run, you must:
- Create a Hadoop record that references the external data set on which you want to run the data flow.
- Create an external data flow rule that you want to run on an external data set.
To specify where to run an external data flow:
In the header of Dev Studio, click.
On the form that opens, provide details about where to run the external data flow:
- Applies to – The class on which the external data flow is defined.
- Access group – An instance of Data-Admin-Operator-AccessGroup rule.
- External data flow – The name of the external data flow rule that you want to use for external processing.
- Hadoop – The Data-Admin-Hadoop record instance where you want to run the data flow. This field is auto-populated with the Hadoop record that is configured as the source for the selected external data flow rule. You can configure multiple instances of a Hadoop record that point to the same external data set but have different run-time settings.
Click Create. The run object is created and listed on the External processing tab.
In the External Data Flow Run window that is displayed, click Start to run the external data flow. In this window, you can view the details for running the external data flow.Depending on the current status of the external data flow, you can also stop running or restart the external data flow from this window or on the External processing tab of the Data Flows landing page.
On the External processing tab, click a run object to monitor its status on the External Data Flow Run window.
- Managing external data flow runs
You can manage existing external data flows on the External processing tab of the Data Flows landing page. For each external data flow, you can view its ID, the external data flow rule, the start and end time, the current execution stage, and the status information. You can also start, stop, or restart an external data flow, depending on its current status.
- External Data Flow Run window
You can monitor and manage each instance of running an external data flow from the External Data Flow Run window. This window gives you detailed information about each stage that an external data flow advances through to completion.
- Data Flows landing page
This landing page provides facilities for managing data flows in your application. Data flows allow you to sequence and combine data based on various sources, and write the results to a destination. Data flow runs that are initiated through this landing page run in the access group context. They always use the checked-in instance of the Data Flow rule and the referenced rules.
- JCA Resource Adapter form – Completing the Connection tab
Complete the Connection tab to identify the resource adapter's Connection Factory and to provide information about how the resource adapter connects to the back-end enterprise information system (EIS).
- Creating external data flows
External Data Flow (EDF) is a rule for defining the flow of data on the graphical canvas and executing that flow on an external system. With EDF, you can run predictive analytics models in a Hadoop environment and utilize its infrastructure to process large numbers of records to limit the data transfer between Hadoop and the Pega Platform.