Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Data Flow Run performance degrades

SA-84582

Summary



Pega Platform 7.3.1 is hosted on WebSphere and installed on SQL Server 2016 SP2 database. The user has configured a data flow which takes a long time to complete. 
A primary database table dataset, of the data flow with 100K records, composes a Join against four secondary database table datasets each of which contain around 10-13 million records. The data flow run takes more than 20 hours to complete.



Error Messages



Not Applicable


Steps to Reproduce

  1. Create a data flow with a primary database table dataset containing 100K records.
  2. Use the Compose operation in the data flow to perform a Join for four secondary database table datasets each of which have around 10-13 million records. The data flow Run takes a long time to complete in the Pega instance hosted on WebSphere and SQL Server 2016.


Root Cause



A defect or configuration issue in the operating environment.
The Partition key was not configured in the primary dataset. Hence, the Pega application was processing one record at a time. The database query generated by the Compose operation to fetch 250 records (default batch size) from secondary datasets has a lengthy IN filter clause (multiple Bind variables). Therefore, the database Query Optimizer ignored the Indexing.



Resolution



Perform the following local-change:
  1. Configure a Partition key in the primary dataset.

    For best performance, use a Partitioning key property with not too many and not too few distinct values.

    Partition Key allows the data flow's Batch processing to be distributed vertically and horizontally within a multi-threaded or a multi-node system.

     
  2. Verify the data flow Run by lowering the batch size to values such as 200, 100, 50, 25, and use a batch size that provides optimum performance.

    Some of the databases ignore the Indexing when a long list of Bind variables is passed in the query.

    Reducing the batch size helps in such use cases. The batch size in Pega Decisioning is set using the below Dynamic System Settings (DSS) record:


    Owning Ruleset: Pega-DecisionEngine
    Purpose: structureddata/defaultBatchSize
Suggest Edit

Published March 19, 2020 - Updated October 8, 2020

Did you find this content helpful? Yes No

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us