Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

JVM goes unresponsive

SA-1258

Summary



Root cause analysis on the failure of one of the two production servers.


Error Messages




[8/11/14 8:40:27:498 HST] 0000011e ThreadMonitor W   WSVR0605W: Thread "WebContainer : 13" (00000223) has been active for 689619 milliseconds and may be hung.  There is/are 2 thread(s) in total in the server that may be hung.


Steps to Reproduce



NA


Root Cause



We can see a lot of the WebContainer threads blocked / hung on the following stack –
 
[8/11/14 8:40:27:498 HST] 0000011e ThreadMonitor W   WSVR0605W: Thread "WebContainer : 13" (00000223) has been active for 689619 milliseconds and may be hung.  There is/are 2 thread(s) in total in the server that may be hung.
      at java.lang.Object.wait(Native Method)
      at java.lang.Object.wait(Object.java:167)
      at com.pega.apache.log4j.AsyncAppender.append(AsyncAppender.java:193)
      at com.pega.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
      at com.pega.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
      at com.pega.apache.log4j.Category.callAppenders(Category.java:206)
 
 
[8/11/14 8:43:27:751 HST] 00000003 ThreadMonitor W   WSVR0605W: Thread "WebContainer : 11" (0000017a) has been active for 675009 milliseconds and may be hung.  There is/are 3 thread(s) in total in the server that may be hung.
      at com.pega.apache.log4j.Category.callAppenders(Category.java:204)
      at com.pega.apache.log4j.Category.forcedLog(Category.java:391)
      at com.pega.apache.log4j.Category.alert(Category.java:728)
      at com.pega.pegarules.priv.LogHelper.doAlert(LogHelper.java:1183)
      at com.pega.pegarules.priv.LogHelper.alert(LogHelper.java:906)
      at com.pega.pegarules.priv.LogHelper.alert(LogHelper.java:892)

To summarize, the JVM logs indicate that the AES server is either:
 
  1. not returning any data (but the socket is alive).
 
-and / or-
 
  1. There is a network issue between the AES Server and the monitored node.
 
- and possibly-
 
  1. The monitored node is caught in a ‘positive feedback’ loop: assuming that AES is responding in a way which is causing the SOAPAppenders to wait for a long time; which in turn (due to synchronized code) causes other threads to take longer to complete their tasks – which may trigger further ALERTs – which would then cause calls to the Async/SOAP appender buildup.


Resolution



The following next step for resolution was recommended –
 
Modify the following APPENDERs under prlogging.xml for the monitored nodes –
 
<appender name="ASYNC" class="com.pega.apache.log4j.AsyncAppender">
<param name="BufferSize" value="1280"/>
<param name="Blocking" value="false"/>
[…]
</appender>
 
<appender name="ALERT-ASYNC" class="com.pega.apache.log4j.AsyncAppender">
<param name="BufferSize" value="1280"/>
<param name="Blocking" value="false"/>
[…]
</appender>
 
The above change should protect the monitored nodes should the AES server backup threads in the same way again in the future. In the snippet above, The DEFAULT_BUFFER_SIZE of 128 events has been increased by a factor of 10. 
 

Published January 31, 2016 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us