Support Article

Restart of all PRPC nodes fails on initial attempt

SA-31368

Summary



PRPC initialization sporadically fails when multiple nodes are restarted.


Error Messages



From App Server STDOUT log:
JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back.

From PegaRULES log:
2016-11-15 20:55:56,352 [server.com] [ STANDARD] [ ] [ ] ( internal.mgmt.PRNodeImpl) INFO - Starts joining cluster
2016-11-15 21:01:21,366 [server.com] [ STANDARD] [ ] [ ] ( internal.mgmt.PREnvironment) ERROR - java.lang.IllegalStateException: Node failed to start!
2016-11-15 21:01:21,371 [server.com] [ STANDARD] [ ] [ ] ( etier.impl.EngineStartup) ERROR - PegaRULES initialization failed. Server: server.com
com.pega.pegarules.pub.context.InitializationFailedError: PRNodeImpl init failed
at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:388)
at com.pega.pegarules.session.internal.PRSessionProviderImpl.getThreadAndInitialize(PRSessionProviderImpl.java:1998)
Caused by: java.lang.IllegalStateException: Node failed to start!
at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:125)


 

Steps to Reproduce



Not Applicable 


Root Cause



A configuration setting in the operating environment (the amount of time that the Controller Boot Thread waits for app server startup) was set too low, causing the servers to shut themselves back down.

Resolution



Make the following change to the operating environment: Adjust the "jboss.as.management.blocking.timeout" in JBOSS configuration file (standalone-*.xml or domain.xml - depending on clustering setup) to increase the value.  Initial value set was 300 seconds, recommended increase to 600 seconds.

 

Published December 13, 2016 - Updated December 31, 2016

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.