Table of Contents

Article

Tutorial: Configuring a remote repository as a source for a File data set

To use data stored in cloud-based storage systems such as JFrog Artifactory, Amazon S3, or Microsoft Azure in your Pega Platform™ applications, configure a remote repository as a data source for a File data set. You can apply this functionality to set an automated data transfer to Pega Platform in the cloud by enabling parallel loads from CSV and JSON files that are stored remotely, instead of creating and maintaining relational databases to transfer remote files.

You can then convert and use the transferred data in various scenarios, such as in a marketing strategy or for customized correspondence.

After you create a File data set with a remote repository, you add a reference to the File data set in a data flow. See Referencing remote repository data in a data flow

Use case

The marketing team of the Xauto company wants to use their client data, which is stored in Amazon S3, in their decision strategy on Pega Platform in the cloud.

Creating a File data set with a remote repository

To use data that is stored in a remote repository, such as Amazon S3, create a File data set that references that directory.

  1. In Dev Studio, click Create > Data Model > Data Set.
  2. In the Data Set Record Configuration section, enter the data set parameters:
    1. In the Label field, enter a name for the new record, for example:
      fileGZip
    2. In the Type field, select File.
    3. In the Context section, select the Apply to class and ruleset version.
    4. Confirm the settings by clicking Create and open.
      Thumbnail
  3. In the New tab, in the Data source section, click Files on repositories.
  4. In the Connection section, click the Open icon to the right of the Repository configuration field and configure the remote repository:
    Thumbnail
    1. In the Create Repository tab, enter a description in the Short description field, for example:
      Xauto customer data directory (Amazon S3)
    2. Enter the Repository name, for example:
      s3
    3. In the Edit Repository tab, click Select and select S3 as your repository type.
      Thumbnail
  5. In the Configuration section, enter the parameters for your Amazon S3 repository:
    1. In the Bucket field, enter the S3 bucket location where artifacts are stored, for example:
      customer-data-doc
    2. In the Authentication profile field, select or create an authentication profile to connect to the repository.

      For more information, see Creating an authentication profile.

    3. To use an Amazon Key Management Service (KMS) keystore for storing keys and certificates, select the Server side data encryption with KMS managed keys check box and enter the KMS key ID.

      For more information, see Keystores.

    4. In the Root path field, enter the location of the root folder in S3, for example:
      root
      Thumbnail
  6. Verify the credentials by clicking Test connectivity.
    Thumbnail
  7. Click Save.
  8. In the File configuration section, in the File path field, enter the source file directory, for example:
    /data/customer-data/-Xauto_customers.csv.gz

    To match multiple files in a folder, use an asterisk (*), for example:

    /data/customer-data/-Xauto_custom-*.csv.gz

    Additional details about the selected file are displayed in the Parser configuration section.

  9. Optional: To preview the file, click Preview file.
  10. Optional: To update the settings for the selected file, in the Parser configuration section, enter new parameter values as in the following example:
    Thumbnail
    Time properties in the selected file can be in a different time zone than the time zone that is used by Pega Platform. To avoid confusion, specify the time zone in the time properties of the file, and use the appropriate pattern in the settings.
  11. For CSV files, in the Mapping tab, modify the number of mapped columns:
    • To add a new column, click Add mapping.
    • To remove a column and the associated property mapping, click the Delete mapping icon for the applicable row.
  12. For CSV files, in the Mapping tab, check the mapping between the fields in the CSV file and the corresponding properties in Pega Platform:
    Thumbnail
     
    • To map an existing property to a CSV file column, in the Property column, press the Down Arrow key and select the applicable item from the list.
    • For CSV files with a header row, to automatically create properties that are not in Pega Platform and to map them to CSV file columns, click Create missing properties. Confirm the additional mapping by clicking Create.
    • To manually create properties that are not in Pega Platform and to map them to CSV file columns, in the Propertycolumn, enter a property name that matches the Column entry, click the Open icon and configure the new property. For more information, see Creating a property.
      For CSV files with a header, the Column entry in a new mapping instance must match the column name in the file.
  13. Confirm the data set settings by clicking Save.

Next steps

Add a reference to your File data set in the Source shape of a data flow. See Referencing remote repository data in a data flow.

 

Published June 22, 2018 — Updated September 28, 2018


100% found this useful

Related Content

Have a question? Get answers now.

Visit the Pega Support Community to ask questions, engage in discussions, and help others.