Importing large amounts of data by using the data import File Listener
When you need to import large volumes of data (millions of rows), use the data import File Listener instead of the data import wizard. The data import File Listener uses multithreading for faster throughput, while the wizard uses single-thread processing.
Pega Sales Automation™ includes a File Listener for the following entities:
- Lead (individual and business)
- Opportunity (individual and business)
- Customer activity
- Account (beginning with Pega Platform 8.2)
For best performance, while using the data import File Listener, keep in mind the following recommendations:
- Before starting the import for all your records, import a few records to start with and fix any issues.
- The size of the File Listener base upload should not exceed 1 million records in a single file.
- Batch size value recommended for upload is 1000 records. Set it up in App Studio > Settings > Application settings.
- To improve performance and to disable creating audit history, use Add Only mode for the initial data import.
- To ensure a maximum parallel processing, there must be as many input files for the file listener as there are threads, because each thread processes one file at the time. Set it up in the File Listener properties in Dev Studio, in the section.
- As of 8.3, in a PostgreSQL single-tenant system, unique ID generation is highly performant during a high-volume import process. Work item IDs are generated in batches, and you can set the batch size with the idGenerator/defaultBatchSize dynamic system setting. For more information, see Increased performance for work ID generation and <link to dss help topic>.
- In non-PostgreSQL multi-tenant systems, high-volume import process should not include generating unique IDs. Include pyID for work object records in the .csv import file to skip calling the GenerateID activity and by doing this, save time. After contacts import is complete, update the unique ID stored in the data-uniqueID database table. Set the value in the data-uniqueID database table to the last imported pyID record in the contact table.
- Database indexes improve query performance; however, when you update a large database table, the system performs reindexing, which can cause lower performance. Remove non-essential indexes during the import phase. After the import is complete, enable indexing.
To import data by using the data import File Listener, complete the following steps:
The data import File Listener uses the same underlying APIs as the data import wizard to process files located in predetermined folders on the server. Importing data by using the data import File Listener requires templates. It is recommended to use the data import wizard to make any template changes prior to using the file listener. For more information, see Preparing data and Pega Sales Automation sample data templates.
This task applies to both on-premises and cloud environments.
- For on-premises configuration, perform the following steps:
- In the navigation panel of App Studio, click Settings > Application Settings.
- In the File Listener Configurations section, enter the base folder in the File Listener source location and the email address to which you want to send notifications.
The following figure shows an example of thesection.
- Optional: To improve performance and to disable creating audit history, in the Initial Data Migration check box. section, select the
Optional: If you disabled creating audit history in step 1c, after the import is completed, clear thecheck box to generate audit records.
- Click .
- Optional: If you want to modify the default template and purpose configuration, in the navigation panel of Dev Studio, click App,and then search for and open the ResourceSettings data transform.
The following figure shows an example of the Resource settings transform.
Resource settings example configuration
By default, the data import File Listener is configured with SA_<name of objects> as a template and Add or update as a purpose.
- For Pega Cloud configuration, perform the following steps:
It is recommended to use SFTP server implementation.
- In the header of Dev Studio, search for and select the storage/class/defaultstore:/type dynamic system setting (DSS).
- In the Value field, enter
- Click .
- In the header of Dev Studio, search for and select the FileListenerSourceLocation dynamic system setting.
- In the Value field, enter the base folder in the File Listener source location.
- Click .
- In the navigation panel of Dev Studio, click Records > Integrations-Resources > File Listener, and then open a listener that you want to run.
- In the Block startup check box. section, clear the
- In the
Only .csv and .txt formats are supported.
section, enter the source file format.
- In the
section, set the number of threads per node to the number of CPUs on that node.
The number of threads per node should be the same as the number of CPUs on that node. By doing this, you improve the performance of the initial load.
- Click .
- In the header of Dev Studio, click the Admin Studio. menu, and then click
- In the navigation panel of Admin Studio, click Resources > Listeners, and then open a listener that you configured.
- In the section, enter your user name and password.
What to do next
After an entire file is processed, output files are created in the source file location that you specified in App Studio. The output file lists failed records and information about each error. The data import results summary is emailed to the notification email addresses that are listed as part of the File Listener configuration process.