Production system receives OOM errors
Administrators observe OOM errors on 4 Production server nodes, all hosted from the same machine.
Following observed in the native_stderr.log:
VMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2014/12/24 11:33:08 - please wait.
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2014/12/24 11:33:08 - please wait.
JVMDUMP007I JVM Requesting Heap dump using '/dumps/was85LV3_K10/heapdump.20141224.113308.7114.0001.txt'
JVMDUMP010I Heap dump written to /dumps/was85LV3_K10/heapdump.20141224.113308.7114.0001.txt
JVMDUMP007I JVM Requesting Heap dump using '/dumps/was85LV3_K10/heapdump.20141224.113308.7114.0002.txt'
JVMDUMP010I Heap dump written to /dumps/was85LV3_K10/heapdump.20141224.113308.7114.0002.txt
JVMDUMP032I JVM requested Java dump using '/dumps/was85LV3_K10/javacore.20141224.113308.7114.0003.txt' in response to an event
JVMDUMP010I Java dump written to /dumps/was85LV3_K10/javacore.20141224.113308.7114.0003.txt
JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
JVMDUMP032I JVM requested Java dump using '/dumps/was85LV3_K10/javacore.20141224.113308.7114.0004.txt' in response to an event
Normally, the system information, such as “Free memory”, is added to the PegaRULES log every 10 minutes.
2014-12-24 11:28:48,573 [RULESLV03,maxpri=10]] [ STANDARD] [ ] ( internal.async.Agent) INFO - System date: Wed Dec 24 11:28:48 GMT 2014 Total memory: 2,150,957,056 Free memory: 954,846,072 Requestor Count: 26 Shared Pages memory usage: 0%
2014-12-24 11:32:45,811 [ WebContainer : 209] [ STANDARD] [ LTSB-GPMT:03.01] (PMT_Work_RiskReviewCase.Action) INFO customer-pega.intranet.group|127.0.0.1 4106321 - GPMT: Success - Case creation. Feed: . Parent case ID (feed): GPMT-PR-241214-14. Child case ID: GPMT-CC-241214-43
2014-12-24 11:57:11,173 [RULESLV03,maxpri=10]] [ STANDARD] [ ] (.access.DatabaseConnectionImpl) ERROR File.GWCSFircoSoftInput|initialize your_operatorid - Couldn't obtain a connection. Refresh the DataSource, and try again
2014-12-24 11:57:13,652 [RULESLV03,maxpri=10]] [ STANDARD] [ ] ( internal.async.Agent) INFO - System date: Wed Dec 24 11:57:13 GMT 2014 Total memory: 1,691,353,088 Free memory: 1,023,355,912 Requestor Count: 27 Shared Pages memory usage: 0%
2014-12-24 11:57:16,107 [ WebContainer : 197] [ ] [ ] (ngineinterface.service.HttpAPI) ERROR customer-pega.intranet.group|127.0.0.1 - 127.0.0.1: com.pega.pegarules.pub.context.RequestorLockException
However, in the above extract (PegaRULES_lv03_c1.log) more than 10 minutes is observed, which could be an indication that the system has suspended that thread while it is performing the heap dump.
No OutOfMemoryError in the PegaRULES log files that were sent. So the original OOM was reported at 11:30 whereas the same error does not appear in the PegaRULES log until 12:25/12:26.
There appears to be ~300 occurrences of the OOM in each of the two log files.
This appears to be happening when the JVM is trying to allocate memory in the native heap. See IBM support article entitled java.lang.OutOfMemoryError while creating new threads. Also, increase the process limit for your user. Furthermore, verify that the AES server is up and running.