LinkedIn
Copied!

Table of Contents

Importing and exporting data using the File data set

During business-as-usual operations, organizations often need to transfer data to and from Pega Customer Decision Hub™ on Pega Cloud. Pega Platform™ provides the ability to import and export data by using the File data set functionality.

Importing customer data into Pega Customer Decision Hub on Pega Cloud

To make the best, most relevant next-best-action decisions, Pega Customer Decision Hub needs to have up-to-date data about your customers, their demographics, products holding, activity data, and other information that may be stored in a third-party system. Because this information is updated in real-time, you must ensure that the updates propagate to Pega Customer Decision Hub. To do this, set up an import process using Pega Platform's data import capabilities.

Understanding the import process

You can configure a File data set to support an end-to-end import process for any CSV or JSON files that contain customer data.

The process starts when data files are uploaded into a Pega Cloud repository and a file listener triggers the data import. To implement a pre-processing check, you must create a custom activity that validates the uploaded files against a manifest file. You can configure the manifest file to contain the file path and file-related metadata, such as information about the files to process, the total record count, and the number of records in each file.

If the pre-processing check passes, the staging data flow runs and imports the actual data. After the data flow run, you can configure a post-processing check that validates the imported data against the manifest file. If the validation passes, the data import is considered successful and the original files are cleaned from the Pega Cloud repository.

For a sample end-to-end implementation of the process, see Configuring the import process. The following diagram shows an overview of the process stages, including use of a manifest file to validate the imported data.

Data import workflow
"A flowchart of the data import process"
Data import workflow

Configuring the data import process

To import customer data into your Pega Customer Decision Hub application, you must create the following Pega Platform artifacts:

  • A repository. For more information, see Creating a repository.
  • A File data set that you configure to read CSV or JSON files.
  • Optional: A manifest file which contains the files and metadata related to the files.
  • A file listener which periodically looks for a file pattern, and then triggers the data import on the arrival of the file.
  • A data flow with the File data set as a source which writes data to the specified destination data set.

Steps

  1. Create a File data set with a repository to process CSV or JSON files.
    For more information, see Creating a File data set record for files on repositories.
    In the File configuration section, select how you want to define the files to read or write:

    • To define a single file or a range of files, select Use a file path.
      Sample file path configuration
      "Sample file path configuration"
      Sample file path configuration
      To match multiple files in a folder, use an asterisk (*) as a wild card character. For example, in the /dt1/Customers*.gzip file path, Customers*.gzip corresponds to all the files that contain the prefix customer.
    • If the files that you upload are encrypted, you can use the Custom stream processing option to decrypt them, or to encrypt them when saving to a repository.
      For more information, see Requirements for custom stream processing in File data sets.
    • For multiple files that you want to list in a manifest file, select Use a manifest file.
      Sample manifest file configuration
      "Sample manifest file configuration"
      Sample manifest file configuration
      You can use a manifest file to verify whether Pega Platform imported all the files successfully. The manifest file is backed by a data model which is a part of the Data-Ingestion-Manifest class.

      By default, a manifest file uses the following .xml format:

      <manifest>

      <files>

      <file>

      <name>file0001.csv</name>

      </file>

      <file>

      <name>file0002.csv</name>

      </file>

      </files>

      </manifest>

      To enable your application to use a different manifest file format, extend the Data-Ingestion-Manifest class. 

      Extending the manifest file
      "Extending the manifest file"
      Extending the manifest file
      For an example of an extended manifest file, see Sample manifest file.
  2. Create a data flow with the File data set that you configured in step 1 as a source and a destination data set.
    For more information, see Creating a data flow.

  3. Create a file listener that listens for the arrival of a manifest file in the repository, and then triggers the data import.

    Creating a file listener
    "Creating a file listener"
    Creating a file listener

    For more information, see Creating a file listener.

  4. Configure a Service File rule to process the manifest file before the actual data import.

    You can use this step to verify that all the files from the manifest file exist, and whether the files' size in the Pega Cloud repository matches the files' size in the manifest file. If there is a mismatch, not all the files were transferred to the Pega Cloud repository and the data import process failed.
    For more information about configuring Service File rules, see Service File rules.
    You can use repository APIs for all file-related operations. For more information, see Repository APIs.

     

    Configuring the service file to process the manifest file
    "Configuring the ManifestFileService file"
    Configuring the service file to process the manifest file

    Result: If the data import process is successful, the data flow runs, reads the records from the File data set, and writes them to the destination data set.

  5. Optional: Configure a post-processing activity to verify that the number of processed records equals the actual number of records.
    If the numbers do not match, the import process failed.

  6. Optional: Configure a cleanup activity that performs a truncate operation in the File data set to remove all the files in the repository.
    Configure the cleanup step with the following parameters, as shown in the figure:
    Method: DataSet-Execute
    Data set: your File data set
    Operation: Truncate

    Configuring a cleanup activity to remove files from the repository
    "Configuring a cleanup activity to remove files from the repository"
    Configuring a cleanup activity to remove files from the repository
    For more information, see Creating an activity.

Sample manifest file

A manifest file can contain the file path and file-related metadata, such as information about the files to process, the total record count, and the number of records in each file, as in the following example:

<manifest>
<totalRecordCount>30000</totalRecordCount>
<files>
<file>
<name>file0001.csv.gzip</name>
<size>100MB</size>
<recordCount>10000</recordCount>
</file>
<file>
<name>file0002.csv.gzip</name>
<size>100MB</size>
<recordCount>10000</recordCount>
</file>
<file>
<name>file0003.csv.gzip</name>
<size>100MB</size>
<recordCount>10000</recordCount>
</file>
</files>
</manifest>

Exporting customer data out of Pega Customer Decision Hub

Organizations frequently need to export data such as transactional data, analytical and monitoring data, or interaction history results out of Pega Customer Decision Hub for further processing. For example, the results of next-best-action decisions are recorded in Pega Customer Decision Hub as interaction history for each customer. Accessing Interaction History data is necessary for reporting in external tools that create insight beyond the scope of the Pega application. For example, you can use the File data set functionality to export data out of Pega Customer Decision Hub for use with an external Business Intelligence reporting tool.

Understanding the export process

You can define a data flow that exports interaction history records into a File data set. Running the data flow exports the data out of Pega Customer Decision Hub and into a specified repository.

Configuring the export process

To export the data associated with daily interactions into a Pega Cloud Services repository, and then use this data for additional analysis, configure a File data set and a data flow that uses this File data set as the destination.

Steps

  1. Create a File data set with a file path that leads to the location in which you want to export the interaction files.
    Sample File data set configuration
    "Sample File data set configuration"
    Sample File data set configuration
    To match multiple files in a folder, use an asterisk (*) as a wild card character. You can also use date/time patterns in file names for use cases in which the file names need to have a date/time associated with them.
    For more information, see Creating a File data set record for files on repositories.
  2. Create a data flow with the source data set that you want to export to a file and the File data set that you configured in step 1 as the destination.
    Sample File dataflow configuration
    "Sample dataflow configuration"
    Sample File dataflow configuration
    For more information, see Configuring a data transform.
  3. Run the data flow.
    Result: The data flow run exports the files to the repository. The file names include a date/time as configured in the data set. Additionally, a .meta file is created that contains the metadata of each file saved to the repository, as in the following example:

<?xml version='1.0' encoding='UTF-8'?>
<file>
<name>records_2020-02-21_002003_00000.json</name>
<size>28773955</size>
<recordCount>11000</recordCount>
</file>

 

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.