RuleUsageSnapshot agent deadlocking on pr4_log table deletes
In a cluster, multiple JVMs deadlock processing RuleUsageSnapshot.
2017-03-17 00:00:30,491 [.PegaWorkManager : 1] [ STANDARD] [ PegaRULES:07.10] (l.access.ConnectionManagerImpl) ERROR - Not returning connection 3 for database "pegadata" to the pool as it previously encountered the following error
User ID: System
Last SQL: delete from DATA.pr4_log_rule_usage_details where pxObjClass = 'Log-RuleUsage-Details' and pxSystemNode = ? and pyLabel = ?
com.ibm.db2.jcc.am.SqlTransactionRollbackException: The current transaction has been rolled back because of a deadlock or timeout. Reason code "68".. SQLCODE=-911, SQLSTATE=40001, DRIVER=4.19.66
Steps to Reproduce
A defect or configuration issue in the operating environment
In the normal course of events the collection of RuleUsageSnapshot data will happen once a day as part of the PegaRULES agent. This should be scheduled to run on all nodes in the cluster. It is then possible for deadlocks to occur when the code is run on multiple nodes around the same time.
Perform the following local-change:
Using the Agent Schedule for the PegaRULES agent for each node, schedule the start time to be offset on each node. In that way the data - which is deleted by System Node ID, can be purged and re-added without contention.
If this does not prove to be a sufficient change and deadlocks still occur then consider partitioning the data on the PXSYSTEMNODE column. DBA staff should be consulted regarding the implementation of this level change.