LinkedIn
Copied!

Table of Contents

Best practices for processing files using the file listener

Improve the resiliency of your file listeners by using file processing best practices.

Prevent processing failures caused by partially written files

When writing a very large file into the source location folder, use a file name or extension that will not be matched by the file listener’s source name mask. After the file has been fully written to the folder, rename the file so that it can be seen and processed by the listener. Renaming the file prevents the file listener from attempting to process a file that has only been partially written to the source location.

Renaming the file is not required when using a cloud-based repository. For example, when using Amazon S3 or Microsoft Azure, renaming the source file is not necessary because the storage guarantees that the file is not visible and cannot be read until it has been fully uploaded to the cloud file storage.

Prevent redundant file processing in multi-node environments

When identical file listeners are running on multiple nodes within a cluster, lock the listener’s temporary file to prevent redundant processing of the same file. As an additional safeguard, it is recommended that you ignore duplicate file names if the source files always have unique names.

Configure temporary file locking and duplicate file name handling on the Process tab of the file listener record. For more information, see Configuring file listener processing.

Automate error recovery for intermittent processing failures

Not all file processing failures originate from bad source file data. Some processing failures are the result of temporary circumstances, such as an unavailable shared resource. In these cases, the file listener can attempt to reprocess the file data one or more times after detecting a processing failure.

Configure automated error recovery options on the Error tab of the file listener record. For more information, see Configuring file listener error processing.

Process records from very large delimited files in small batches

When working with very large files that contain hundreds or thousands of records, use record at a time processing in combination with checkpoint processing to process file data in small batches and commit the results to the Pega database before attempting to process the next batch. Doing this allows file processing to continue from the point where the file listener left off after an unexpected node shutdown, rather than starting over at the beginning of the file.

Configure record at a time processing on the Method tab of the service file rule. For more information, see Service File form - Completing the Method tab.

Configure checkpoint processing on the Request tab of the service file rule. For more information, see Service File form - Completing the Request tab.

Reduce risk of data loss by streamlining file processing activities

Eliminate synchronous dependencies on external resources from the activity rules that process file data so that processing completes within a reasonable time frame and does not delay the process of shutting down a file listener or cluster node. When a file listener receives a shutdown request from the node manager, it has a limited window of time to shut down gracefully before the node is forcibly removed from the cluster. If the processing activity is still running when the node JVM is shut down, the end result might be data loss or data corruption.

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.