Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

OutOfMemory exceptions in batch node

SA-9416

Summary



One batch node is experiencing OutOfMemory (OOM) exceptions and has crashed multiple times today. 

Error Messages



There aredatabase connection errors as well as CPU starvation issues.

Recent info from Pega logs:
2015-04-03 14:09:29,731 [j2ee14_ws,maxpri=10]] [ STANDARD] [ ] (.access.DatabaseConnectionImpl) ERROR - Couldn't obtain a connection. Refresh the DataSource, and try again
2015-04-03 14:09:32,999 [j2ee14_ws,maxpri=10]] [ STANDARD] [ ] (riv.factory.ObjectArrayFactory) INFO - Factory-Internal pool expansion for ObjectArray[1] from 20 up to 40.
2015-04-03 14:09:33,002 [j2ee14_ws,maxpri=10]] [ STANDARD] [ -:03.01] (.access.DatabaseConnectionImpl) ERROR - Couldn't obtain a connection. Refresh the DataSource, and try again

[3/31/15 13:11:12:878 CDT] 00000216 ApplicationMo W DCSV0004W: DCS Stack DefaultCoreGroup at Member [----]: Did not receive adequate CPU time slice. Last known CPU usage time at 13:10:10:470 CDT. Inactivity duration was 32 seconds.

[3/31/15 13:11:12:882 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 32 seconds.
[3/31/15 13:11:55:255 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 12 seconds.
[3/31/15 13:12:30:550 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 5 seconds.
[3/31/15 13:13:10:583 CDT] 000000c0 CoordinatorCo W HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 10 seconds.

Steps to Reproduce



Unknown.

Root Cause



The root cause of this problem is that deferred operations were filling the heap.  This occured because the case structure has a case with over 10,000 covered objects.  A batch process was attempting to update this case and all its covered objects.  This is done as a deferred operation, requiring all covered objects to be in the clipboard. This filled up the heap with a single deferred operation and an OOM condition was experienced.

This was diagnosed by analyzing a heap dump - a single DeferredOperationsImpl object was taking over 3GB out of the 4GB heap at the time of the OOM condition.


Resolution



This issue is resolved through the following local change: Redesign the batch process so that it is not attempting to update all covered objects in a case at once, or redesign the case structure so that one cover object does not have thousands of covered objects.

To temporarily work around this issue, a very large JVM could be provisioned to handle the problematic object and allow processing to complete.

Published June 12, 2015 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us