Support Article

Data Flow Run performance degrades

SA-84582

Summary

Pega Platform 7.3.1 is hosted on WebSphere and installed on SQL Server 2016 SP2 database. The user has configured a data flow which takes a long time to complete.
A primary database table dataset, of the data flow with 100K records, composes a Join against four secondary database table datasets each of which contain around 10-13 million records. The data flow run takes more than 20 hours to complete.

Error Messages

Not Applicable

Steps to Reproduce

Create a data flow with a primary database table dataset containing 100K records.
Use the Compose operation in the data flow to perform a Join for four secondary database table datasets each of which have around 10-13 million records. The data flow Run takes a long time to complete in the Pega instance hosted on WebSphere and SQL Server 2016.

Root Cause

A defect or configuration issue in the operating environment.
The Partition key was not configured in the primary dataset. Hence, the Pega application was processing one record at a time. The database query generated by the Compose operation to fetch 250 records (default batch size) from secondary datasets has a lengthy IN filter clause (multiple Bind variables). Therefore, the database Query Optimizer ignored the Indexing.

Resolution

Perform the following local-change:

Configure a Partition key in the primary dataset.

For best performance, use a Partitioning key property with not too many and not too few distinct values.

Partition Key allows the data flow's Batch processing to be distributed vertically and horizontally within a multi-threaded or a multi-node system.
Verify the data flow Run by lowering the batch size to values such as 200, 100, 50, 25, and use a batch size that provides optimum performance.

Some of the databases ignore the Indexing when a long list of Bind variables is passed in the query.

Reducing the batch size helps in such use cases. The batch size in Pega Decisioning is set using the below Dynamic System Settings (DSS) record:

Owning Ruleset: Pega-DecisionEngine
Purpose: structureddata/defaultBatchSize

Tags:

Pega Platform 7.3.1

Pega Platform

Decision Management

Published March 19, 2020 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

Data Flow Run performance degrades

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

Data Flow Run performance degrades

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.