Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Hazelcast causes start-up slowness or hang

SA-23147

Summary



Issue was seen running Pega 7.1.9 with a WAS/DB2 stack. Java Virtual Machines (JVMs), two-node cluster, are recycled everyday as part of scheduled downtime maintenance. Intermittently, the application startup hangs and the PID had to be killed after many hours. Subsequent recycle works sometimes.

Error Messages



Scenario 1

[**Date Time Timezone**] 00000045 ThreadMonitor W   WSVR0605W: Thread "XXX" (00000132) has been active for 696299 milliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
   at java.util.HashSet.iterator(HashSet.java:182)
   at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1076)
   at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1075)
   at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.queryState(PRClusterHazelcastImpl.java:158)
.
.
.

Scenario 2

**Date Time**,865 [ XXX     ] [  STANDARD] [                    ] (      internal.mgmt.PRNodeImpl) INFO    - Checking Cluster consistency
**Date Time**,076 [zInstance_1_.event-5] [  STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Cluster membership changed new member joined:
**Date Time**,076 [zInstance_1_.event-5] [  STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Member 'd7db3e55-91ca-4f5a-b9f9-784b8382b6cd/<IP>:5701' <== this node
.
.
<Hazel cast processing time > 60 mins>

**Date Time**,488 [zInstance_1_.event-5] [  STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Cluster membership changed; member exited:
.
.
.

Steps to Reproduce



There is no specific use case to replicate this issue.

Root Cause



A defect in Pegasystems’ code or rules.

The issue is seen when the current state of a cluster member is checked by obtaining a list of all the cluster members, which includes the node requesting list. The first one in the list is fetched and then returned to that state. If the member grabbed from the list is the node checking the state, then it is skipped and the rest of them are not checked (causing a hang when setting runState).

Resolution



Apply HFix-27366.

Published May 14, 2016 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us