Support Article
All nodes in Cluster stop working on rebooting an Ignite server
Summary
All the nodes in the Cluster (Pega client and Ignite servers) stop working after rebooting one Ignite server.
Error Messages
Not Applicable.
Steps to Reproduce
- Log in to the application
- Create and configure a Cluster environment as follows:
- Data Center that has 3 ignite servers and 6 Pega clients
- Data Center two has 3 Ignite servers and 6 Pega clients - Reboot the failed Ignite server
Root Cause
A defect in Pegasystems’ code or rules. The service running on a platform is incorrectly reacting to standalone server nodes leaving the Cluster. When a node joins or leaves the cluster, the cluster notifies the interested parties of this event.
When the platform is deployed in the client-server topology, where Pega acts as a client to a standalone Apache Ignite server, it does not filter the events. Services running on the client incorrectly react to servers joining or leaving the platform, assuming it is another client.
Resolution
- Apply HFix-41862
- Set the following setting to max values:
cluster/failure/detection/timeout
- valid range: (1second, 60 seconds)
- default: 10 seconds
cluster/network/timeout
- valid range: (500ms, 15 seconds)
- default: 5000ms
cluster/discovery/maxtimeout
- valid range: (1 minute, 15minutes)
- default: 10minutes
Published April 6, 2019 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.