Support Article

Hazelcast causes start-up slowness or hang

SA-23147

Summary

Issue was seen running Pega 7.1.9 with a WAS/DB2 stack. Java Virtual Machines (JVMs), two-node cluster, are recycled everyday as part of scheduled downtime maintenance. Intermittently, the application startup hangs and the PID had to be killed after many hours. Subsequent recycle works sometimes.

Error Messages

Scenario 1

[**Date Time Timezone**] 00000045 ThreadMonitor W   WSVR0605W: Thread "XXX" (00000132) has been active for 696299 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.HashSet.iterator(HashSet.java:182)
at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1076)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1075)
at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.queryState(PRClusterHazelcastImpl.java:158)
.
.
.

Scenario 2

**Date Time**,865 [ XXX     ] [ STANDARD] [                    ] (      internal.mgmt.PRNodeImpl) INFO    - Checking Cluster consistency
**Date Time**,076 [zInstance_1_.event-5] [ STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Cluster membership changed new member joined:
**Date Time**,076 [zInstance_1_.event-5] [ STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Member 'd7db3e55-91ca-4f5a-b9f9-784b8382b6cd/<IP>:5701' <== this node
.
.
<Hazel cast processing time > 60 mins>

**Date Time**,488 [zInstance_1_.event-5] [ STANDARD] [                    ] (nternal.PRClusterHazelcastImpl) INFO    - Cluster membership changed; member exited:
.
.
.

Steps to Reproduce

There is no specific use case to replicate this issue.

Root Cause

A defect in Pegasystems’ code or rules.

The issue is seen when the current state of a cluster member is checked by obtaining a list of all the cluster members, which includes the node requesting list. The first one in the list is fetched and then returned to that state. If the member grabbed from the list is the node checking the state, then it is skipped and the rest of them are not checked (causing a hang when setting runState).

Resolution

Apply HFix-27366.

Tags:

Pega Platform

Pega Platform 7.1.9

Published May 14, 2016 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

Hazelcast causes start-up slowness or hang

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

Hazelcast causes start-up slowness or hang

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.