Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Unable to initialize server

SA-10507
SUMMARY

One node on a three-node server sometimes fails to start with the following error in the logs.

ERROR MESSAGES

ERROR|cdc1vpc4lpr92|2015-04-24 10:32:42,601|com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineStartup|PegaRULES initialization failed. Server:
||
com.pega.pegarules.pub.context.InitializationFailedError: PRNodeImpl init failed
        at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:386)
        at com.pega.pegarules.session.internal.PRSessionProviderImpl.getThreadAndInitialize(PRSessionProviderImpl.java:1905)
        at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineStartup.initEngine(EngineStartup.java:657)
        at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl._initEngine_privact(EngineImpl.java:165)
        at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl.doStartup(EngineImpl.java:138)
        at com.pega.pegarules.session.internal.engineinterface.etier.ejb.EngineBean.doStartup(EngineBean.java:121)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethod(PRBootstrap.java:338)
        at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethodPropagatingThrowable(PRBootstrap.java:379)
        at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethodPropagatingThrowable(AppServerBridgeToPega.java:216)
        at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethodPropagatingException(AppServerBridgeToPega.java:238)
        at com.pega.pegarules.internal.etier.ejb.EngineBeanBoot.doStartup(EngineBeanBoot.java:130)
        at com.pega.pegarules.internal.etier.interfaces.EJSLocalStatelessEngineBMT_f2439d86.doStartup(Unknown Source)
        at com.pega.pegarules.web.servlet.WebAppLifeCycleListener._contextInitialized_privact(WebAppLifeCycleListener.java:280)
        at com.pega.pegarules.web.servlet.WebAppLifeCycleListener.contextInitialized(WebAppLifeCycleListener.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethod(PRBootstrap.java:338)
        at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethodPropagatingThrowable(PRBootstrap.java:379)
        at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethodPropagatingThrowable(AppServerBridgeToPega.java:216)
        at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethod(AppServerBridgeToPega.java:265)
        at com.pega.pegarules.internal.web.servlet.WebAppLifeCycleListenerBoot.contextInitialized(WebAppLifeCycleListenerBoot.java:83)
        at com.ibm.ws.webcontainer.webapp.WebApp.notifyServletContextCreated(WebApp.java:1678)
        at com.ibm.ws.webcontainer.webapp.WebAppImpl.initialize(WebAppImpl.java:414)
        at com.ibm.ws.webcontainer.webapp.WebGroupImpl.addWebApplication(WebGroupImpl.java:88)
        at com.ibm.ws.webcontainer.VirtualHostImpl.addWebApplication(VirtualHostImpl.java:169)
        at com.ibm.ws.webcontainer.WSWebContainer.addWebApp(WSWebContainer.java:749)
        at com.ibm.ws.webcontainer.WSWebContainer.addWebApplication(WSWebContainer.java:634)
        at com.ibm.ws.webcontainer.component.WebContainerImpl.install(WebContainerImpl.java:426)
        at com.ibm.ws.webcontainer.component.WebContainerImpl.start(WebContainerImpl.java:718)
        at com.ibm.ws.runtime.component.ApplicationMgrImpl.start(ApplicationMgrImpl.java:1173)
        at com.ibm.ws.runtime.component.DeployedApplicationImpl.fireDeployedObjectStart(DeployedApplicationImpl.java:1370)
        at com.ibm.ws.runtime.component.DeployedModuleImpl.start(DeployedModuleImpl.java:639)
        at com.ibm.ws.runtime.component.DeployedApplicationImpl.start(DeployedApplicationImpl.java:968)
        at com.ibm.ws.runtime.component.ApplicationMgrImpl.startApplication(ApplicationMgrImpl.java:772)
        at com.ibm.ws.runtime.component.ApplicationMgrImpl.start(ApplicationMgrImpl.java:2175)
        at com.ibm.ws.runtime.component.CompositionUnitMgrImpl.start(CompositionUnitMgrImpl.java:445)
        at com.ibm.ws.runtime.component.CompositionUnitImpl.start(CompositionUnitImpl.java:123)
        at com.ibm.ws.runtime.component.CompositionUnitMgrImpl.start(CompositionUnitMgrImpl.java:388)
        at com.ibm.ws.runtime.component.CompositionUnitMgrImpl.access$500(CompositionUnitMgrImpl.java:116)
        at com.ibm.ws.runtime.component.CompositionUnitMgrImpl$CUInitializer.run(CompositionUnitMgrImpl.java:994)
        at com.ibm.wsspi.runtime.component.WsComponentImpl$_AsynchInitializer.run(WsComponentImpl.java:496)
        at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1815)
Caused by:
com.pega.pegarules.pub.PRRuntimeException: Unable to restart the cluster - Another node is starting and held the lock too long
        at com.pega.pegarules.session.internal.mgmt.PRNodeImpl.checkClusterConsistency(PRNodeImpl.java:2380)
        at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:374)

STEPS TO REPRODUCE

1) Start the server.

ROOT CAUSE

The root cause of this problem is a defect in Pegasystems’ code/rules.
 
When the cluster starts, none of the nodes have yet registered their cluster address in pr_sys_statusnodes. There is code that guards against this race condition, which includes locking on “identification/cluster/name” in PRNodeImpl.checkClusterConsistency() while Hazelcast is restarted. The lock timeout is controlled by the waitTime (set to 2000ms) and the number of attempts (set to 30) for a total of 1 minute. Due to the number of nodes contending for this lock, there are a few nodes that ended up timing out.

RESOLUTION

This issue is resolved by hotfix item HFIX-22087 – “Race condition during Hazelcast cluster consistency check”
 
 

Published June 12, 2015 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us