Support Article

OutOfMemory exceptions in batch node

SA-9416

Summary

One batch node is experiencing OutOfMemory (OOM) exceptions and has crashed multiple times today.

Error Messages

There aredatabase connection errors as well as CPU starvation issues.

Recent info from Pega logs:
2015-04-03 14:09:29,731 [j2ee14_ws,maxpri=10]] [ STANDARD] [ ] (.access.DatabaseConnectionImpl) ERROR - Couldn't obtain a connection. Refresh the DataSource, and try again
2015-04-03 14:09:32,999 [j2ee14_ws,maxpri=10]] [ STANDARD] [ ] (riv.factory.ObjectArrayFactory) INFO - Factory-Internal pool expansion for ObjectArray[1] from 20 up to 40.
2015-04-03 14:09:33,002 [j2ee14_ws,maxpri=10]] [ STANDARD] [ -:03.01] (.access.DatabaseConnectionImpl) ERROR - Couldn't obtain a connection. Refresh the DataSource, and try again

[3/31/15 13:11:12:878 CDT] 00000216 ApplicationMo W DCSV0004W: DCS Stack DefaultCoreGroup at Member [----]: Did not receive adequate CPU time slice. Last known CPU usage time at 13:10:10:470 CDT. Inactivity duration was 32 seconds.
[3/31/15 13:11:12:882 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 32 seconds.
[3/31/15 13:11:55:255 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 12 seconds.
[3/31/15 13:12:30:550 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 5 seconds.
[3/31/15 13:13:10:583 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 10 seconds.

Steps to Reproduce

Unknown.

Root Cause

The root cause of this problem is that deferred operations were filling the heap. This occured because the case structure has a case with over 10,000 covered objects. A batch process was attempting to update this case and all its covered objects. This is done as a deferred operation, requiring all covered objects to be in the clipboard. This filled up the heap with a single deferred operation and an OOM condition was experienced.

This was diagnosed by analyzing a heap dump - a single DeferredOperationsImpl object was taking over 3GB out of the 4GB heap at the time of the OOM condition.

Resolution

This issue is resolved through the following local change: Redesign the batch process so that it is not attempting to update all covered objects in a case at once, or redesign the case structure so that one cover object does not have thousands of covered objects.

To temporarily work around this issue, a very large JVM could be provisioned to handle the problematic object and allow processing to complete.

Tags:

Pega Platform

Pega Platform 6.2 SP2

Case Management

Published June 12, 2015 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

Get Started with Community

COVID-19 Employee Safety and Business Continuity Tracker

OutOfMemory exceptions in batch node

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

COVID-19 Employee Safety and Business Continuity Tracker

OutOfMemory exceptions in batch node

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.