Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Hazelcast Operation Timeout Exception

SA-21698

Summary



In a multi nodes environment, Some nodes fail to start up. The issue is not specific to any particular node. Restarting the node fails  with this Hazelcast error. It's saying it's unable to reach a particular node/server, but that  node/server is up and running.


Error Messages



[12/15/15 15:46:47:249 CST] 00000082 SystemOut O 2015-12-15 15:46:47,247 [ apsrs2714] [ STANDARD] [ ] ( internal.mgmt.PREnvironment) ERROR - com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! BasicInvocationFuture{invocation=BasicInvocation{ serviceName='hz:impl:mapService', op=PutOperation{/pega/system/mgmt/nodeidUUID}, partitionId=84, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[<ip>]:8060, backupsExpected=0, backupsCompleted=0}, response=null, done=false} No response has been received! backups-expected:0 backups-completed: 0
[12/15/15 15:46:47:325 CST] 00000082 SystemOut O 2015-12-15 15:46:47,253 [ apsrs2714] [ STANDARD] [ ] ( etier.impl.EngineStartup) ERROR - PegaRULES initialization failed. Server: apsrs2714
com.pega.pegarules.pub.context.InitializationFailedError: PRNodeImpl init failed


Steps to Reproduce



Re-start all the the nodes with Hazelcast enabled at once through cluster level startup


Root Cause



A defect in Pegasystems’ code. Whenever there are multiple nodes within multiple clusters and we perform cluster level startup to bring all the nodes at once, sometimes a race condition occurs and some of the nodes fails to establish connection with other nodes in the cluster and ultimately shuts down after some default number of tries. 

Resolution



Perform the following local-change to avoid the race condition.

Add the Prconfig DSS settings as

prconfig /cluster/consistency/lockattemptdelayms/default value=5000
prconfig /cluster/consistency/maxlockattempts/default value= 150

Shutdown all the nodes and truncate PR_SYS_STATUSUNODES Database table

Bring up all the nodes by starting the clusters one by one

Published April 7, 2016 - Updated October 8, 2020

Was this useful?

50% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us