Support Article
Hazelcast causes start-up slowness or hang
SA-23147
Summary
Issue was seen running Pega 7.1.9 with a WAS/DB2 stack. Java Virtual Machines (JVMs), two-node cluster, are recycled everyday as part of scheduled downtime maintenance. Intermittently, the application startup hangs and the PID had to be killed after many hours. Subsequent recycle works sometimes.
Error Messages
Scenario 1
[**Date Time Timezone**] 00000045 ThreadMonitor W WSVR0605W: Thread "XXX" (00000132) has been active for 696299 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.HashSet.iterator(HashSet.java:182)
at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1076)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1075)
at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.queryState(PRClusterHazelcastImpl.java:158)
.
.
.
Scenario 2
**Date Time**,865 [ XXX ] [ STANDARD] [ ] ( internal.mgmt.PRNodeImpl) INFO - Checking Cluster consistency
**Date Time**,076 [zInstance_1_.event-5] [ STANDARD] [ ] (nternal.PRClusterHazelcastImpl) INFO - Cluster membership changed new member joined:
**Date Time**,076 [zInstance_1_.event-5] [ STANDARD] [ ] (nternal.PRClusterHazelcastImpl) INFO - Member 'd7db3e55-91ca-4f5a-b9f9-784b8382b6cd/<IP>:5701' <== this node
.
.
<Hazel cast processing time > 60 mins>
**Date Time**,488 [zInstance_1_.event-5] [ STANDARD] [ ] (nternal.PRClusterHazelcastImpl) INFO - Cluster membership changed; member exited:
.
.
.
Steps to Reproduce
There is no specific use case to replicate this issue.
Root Cause
A defect in Pegasystems’ code or rules.
The issue is seen when the current state of a cluster member is checked by obtaining a list of all the cluster members, which includes the node requesting list. The first one in the list is fetched and then returned to that state. If the member grabbed from the list is the node checking the state, then it is skipped and the rest of them are not checked (causing a hang when setting runState).
Resolution
Apply HFix-27366.
Published May 14, 2016 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.