Skip to main content
This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.
LinkedIn
Copied!

Table of Contents

Managing clusters with Hazelcast

This article is the first in a series of articles that includes the following companion articles:

Updates to Hazelcast support

Split-Brain Syndrome and cluster fracturing FAQs

Troubleshooting Hazelcast cluster management [Restricted Access - Not Available Publicly]

Starting with Pega 7.1.7, Hazelcast was offered to improve performance of internode communication. In Pega 7.1.9, the near-instantaneous System Pulse feature replaced the older and much slower distribution of rule and data updates though database tables. Since then, many features have been rewritten to use the Hazelcast-backed distributed operations service.

Since Pega 7.1.7, many issues have been reported and many questions raised about Hazelcast and its notoriously talkative logging. Most of these issues can be prevented by applying best practices. Other issues can be resolved by understanding how to troubleshoot them.

This article focuses on Pega 7.3 and later releases of the Pega Platform. For Pega 7.2.2 and earlier releases to Pega 7.1.9, stability hotfixes are available to upgrade these releases to newer versions of Hazelcast.

Hazelcast Editions supported
Best practices
   Port range
   Node ID
   PR_SYS_STATUSNODES
   Network Address Translation (NAT)
   Network Interface Controller (NIC)
   Number of cores
   Graceful shutdown
Concepts and terminology
   Split-Brain Syndrome and Cluster Fracturing
   Master node
   Hazelcast Interceptor
   Clock Drift    
Settings
    Common cluster settings
    Clock Drift settings
    Internet Control Message Protocol (ICMP) settings
    Encryption settings
    Security settings
  

Hazelcast Editions supported

For the latest information, see Updates to Hazelcast support.

Pega supports the Hazelcast Editions shown in the table below.

Hotfixes are available or planned for the following releases:

Pega 7.4: HFix-46618 (Hazelcast 3.10 EE Perpetual License) and HFix-48345 (Alerts)
Pega 7.3.1: HFix-46681 (Hazelcast 3.10 EE Perpetual License)
Pega 7.3: HFix-46682 (Hazelcast 3.10 EE Perpetual License)
Pega 7.2.2: HFix-
47749 (Hazelcast 3.10 EE Perpetual License)
 

 

Pega Platform Releases and Hazelcast Editions supported

Pega Platform Release

Hazelcast Edition

Pega 7.2.2, 7.2.1, 7.2

3.4.1 Community Edition (CE)

Pega 7.4, 7.3.1, 7.3

3.8 Community Edition (CE)

Pega 8.1

3.10 Enterprise Edition (EE)

Pega 8.2

3.10.4 Enterprise Edition (EE)

Pega 8.3

3.11 Enterprise Edition (EE)

Best practices

For successful cluster management, practice the guidelines in this section:

Port range

By default, a Pega node uses port range 5701 to 5800 for Hazelcast. In an environment where a different range is required, use the prconfig.xml property cluster/hazelcast/ports to set the range.

prconfig.xml example

<env name="cluster/Hazelcast/ports" value="5701-5750" />

DSS example

Setting Purpose: prconfig/cluster/hazelcast/ports/default
Value: 5701-5750
Owning Ruleset: Pega-Engine

When to set the port range

If multiple environments run on the same host (for example, QA and DEV), the administrator might need to set the port range to avoid port conflicts or if the default ports are already in use or blocked for any reason.

Node ID

Each Pega node is identified with a Node ID that must be unique in the cluster. If the same Node ID is already used in the cluster, the node fails to start.

Use this setting to more easily identify nodes and their purposes at a glance. A Node ID is generated by default based on certain system setting values. However, as a best practice, set the Node ID manually to reflect the node’s intended purpose.

To set the Node ID, use the JVM argument identification.nodeid as shown in the following examples.

-Didentification.nodeid=SearchNode1

-Didentification.nodeid=BackgroundProcessing1

 

PR_SYS_STATUSNODES

Pega nodes are registered into the table pr_sys_statusnodes at startup time. This table holds information such as the node IP address, node name, and other information.

When a node joins the cluster, a list of cluster-member candidates is loaded from the pr_sys_statusnodes table. The node then tries to establish a connection with the candidates to form a cluster.

Do not truncate the pr_sys_statusnodes table while ANY cluster nodes are up and running! Doing so removes the information needed for a newly started node to discover other nodes that are already running. Consequently, the newly started node forms a new cluster instead of joining the cluster that is already running. 

Network Address Translation (NAT)

If Pega nodes are running behind a network address translation (NAT), they might not see each other. To ensure communication among the nodes, the system administrator should set the public address to the defined address on NAT. This configuration is mainly used when running in private VMs or Docker containers. To set the public address, use the following prconfig.xml setting:

identification/cluster/public/address

See Common cluster settings.

Network Interface Controller (NIC)

If you have multiple Network Interface Controllers (NICs) in your clustered environment, use the cluster/hazelcast/interface setting to specify the IP address that you want the node to communicate on. This  forces Hazelcast to refer to the correct NIC. Avoid the issue described in the Example problem scenario.

The setting should be one IP address.

See Common cluster settings.

prconfig examples

<env name="cluster/hazelcast/interface" value="10.3.10.4"/>

DSS examples

Setting Purpose: prconfig/cluster/hazelcast/interface
Value: 10.3.10.4
Owning Ruleset: Pega-Engine

Example problem scenario

According to the Hazelcast documentation, Other network configurations, you can specify an IP address range using the wildcard (*) on the last digit of the IP address, for example, 192.168.1.* or 192.168.1.100-110.

However, if you specify  *.*.*.* as an IP address in cluster/hazelcast/interface, this value is not supported. The nodes of the cluster are not able to pick up the correct IP address.

Solution: Use wildcards in the last digits only of an IP Address, for example, 192.168.*.*.

Number of cores

Hazelcast recommends having at least 8 CPU cores. Having a low number of cores can cause instability in the cluster because threads might start blocking each other. The number of Hazelcast threads is printed in the log, as shown in the following sample:

2019-04-11 03:20:24,829 [  ip-10-123-2-41] (tor.impl.OperationExecutorImpl) INFO    -[10.123.2.41]:5701 [49d9b0e8c5fa8b21c4ef8d490df72708] [3.10] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)

The line above shows that too few threads were started. For additional guidance, see the Hazelcast IMDG 3.11 Deployment and Operations Guide, the section Basic Optimization Recommendations, which includes the following guidelines:

  • 8 cores per Hazelcast server instance
  • Minimum of 8 GB RAM per Hazelcast member (if not using the High-Density Memory Store)
  • Dedicated NIC per Hazelcast member
  • Linux—any distribution
  • All member nodes should run within the same subnet
  • All member nodes should be attached to the same network switch

Graceful shutdown

Gracefully shut down Pega nodes to avoid losing Hazelcast data partitions. During a graceful shutdown, the data from the node that is shutting down is automatically migrated to the other nodes.

Do not use a kill -9 command! This command stops all processes immediately. The consequences of using this command are negative:

  • No clean shutdown of socket connections
  • No cleanup of temporary files
  • No time to inform sub-processes that the node is going away
  • No time for the node to reset its terminal characteristics
  • Stops processes that are running on the node even if those processes are performing work; no clean exit occurs. Processing stops in mid-stream.

Concepts and terminology

This article and its companion articles assume that you read and understand the Pega Help topics under Managing your systemparticularly the topics for Node configuration > Multi-node systems >  Cluster deployment and High availability > Configuring nodes for high availability > Cluster management.

Split-Brain Syndrome and Cluster Fracturing

Several scenarios can lead to nodes in the cluster being unaware of one another, causing the cluster to split into several smaller clusters of nodes instead of one large one.

For understanding Split-Brain Syndrome and how to prevent it, detect it, and recover from it, see the related article, Split-Brain Syndrome and cluster fracturing FAQs.

Master node

Hazelcast does not have a centralized master node as many other distributed operation technologies have. However, it does maintain an implicit master node whose responsibility it is to keep the other nodes up to date with the latest membership information. The first node to start, that is, the oldest node, is always considered the master node. When the master node leaves the cluster, the remaining nodes begin a mastership process to nominate a new master node. If a Split-Brain scenario develops, two master nodes will be present. When merging two fractured clusters, the master node with fewer nodes yields to (merges into) the master node of the larger cluster.

Hazelcast Interceptor

Because Hazelcast behaves in a fail-fast manner, it is possible for external traffic from other sources to cause Hazelcast instability. One example might be a DDoS attack that takes place on an inbound Hazelcast port. Other examples include security tools that might attempt to breach the port; a flood of traffic will cause poor performance. In these cases, the Hazelcast interceptor may be used to deny list IP addresses that Hazelcast should ignore. This helps Hazelcast filter traffic before attempting to consume it, leading to better communication performance in the advent of third-party traffic to its inbound port. For more information, see Security settings.

Clock Drift

Hazelcast operates outside of ‘time’, but Pega application operations can be severely impacted if the time of each node is not aligned. When clock times begin to drift, an alert is generated in the logs and through PDC. Ensure that systems are running clock synchronization software such as Network Time Protocol (NTP). In addition, Hazelcast might detect delays in traffic. Again, although Hazelcast operates outside of ‘time’, it does pay attention to the time it takes for traffic to propagate between nodes. If Hazelcast detects that traffic is taking abnormally long, it sends warning messages to the PegaCluster logs.

For example, Hazelcast reports inter-node traffic delays caused by system-wide processing such as Java heap garbage collection (GC). In case of larger Java heaps, garbage collection might cause your application to pause for tens of seconds (even minutes for large heaps), badly affecting your application performance and response times.

Settings

Some settings in the Pega configuration file (prconfig.xml) relate directly to Hazelcast. Understand important settings to specify in Hazelcast:

Common cluster settings

The following settings are frequently used for cluster management with Hazelcast.

Common cluster settings for Pega Platform versions

Introduced in this release

Setting name

Prconfig value

Description

Default value or values

Example value or values

7.3.0

Cluster Name

identification/cluster/name

The name for the cluster of nodes

Nodes will only join other nodes that share the same cluster name.

PRPC

PRPC

7.3.0

Cluster Protocol

identification/cluster/protocol

The operating protocol for the cluster

Hazelcast

Hazelcast

7.3.0

Cluster Members

initialization/cluster/members

A static list of cluster member IP addresses, separated by commas
Disables automatic discovery

Not Applicable
(user defined)

<IP 1>, <IP 2>, …. 

7.3.0

Cluster Public Address

identification/cluster/public/address

The public IP address for the cluster

See Network Address Translation (NAT).

Not Applicable
(user defined)

<n.n.n.n>

7.4.0

Hazelcast Outbound Ports

cluster/hazelcast/outboundPortRange

The configured range of outbound ephemeral ports of Hazelcast

Values within the valid range from 5801 to 5900

5801, 5802, 5803-5810

7.3.0

Hazelcast Interface

cluster/hazelcast/interface

A list of valid network interfaces for Hazelcast
See
Network Interface Controller (NIC).

Not Applicable

<n.n.n.n>

7.3.0

Cluster Ports

initialization/cluster/ports

A range of inbound ports (Hazelcast selects one for use.)

Values within the valid range from 5701 to 5800

5701, 5702, 5703-5710

Clock Drift settings

The following Clock Drift settings were introduced in the Pega Platform release indicated.

Clock Drift settings

Introduced in this release

Setting name

Prconfig value

Description

Default value

Example value

7.3.0

Clock Drift Threshold

alerts/cluster/clockdeltathreshold

The maximum allowed difference between any two clocks in the cluster, in seconds

10 seconds

10 seconds

7.3.0

Clock Drift Sampling Rate

alerts/cluster/clocksamplerateminutes

The frequency at which the clocks in the cluster are sampled, in minutes

10 minutes

10 minutes


 

Hazelcast Internet Control Message Protocol (ICMP) settings

The Internet Control Message Protocol (ICMP) is a supporting protocol in the Internet protocol suite. Used by network devices, including routers, ICMP sends error messages and operational information indicating, for example, that a requested service is not available or that a host or router could not be reached. ICMP differs from transport protocols such as TCP and UDP in that it is not typically used to exchange data between systems, nor is it regularly employed by end-user network applications except for some diagnostic tools like ping and traceroute.

The Hazelcast Ping Failure Detector relies on ICMP. To prevent ping failures, consider adjusting the ICMP properties in your Hazelcast declarative configuration file.

To understand the scenarios for which you might need to adjust the ICMP settings, see Hazelcast Failure Detector Configuration

Here are some examples of the Hazelcast ICMP settings:

Hazelcast ICMP settings

Introduced in this release

Setting name

Prconfig value

Description

Default value

Example value

7.4.0

Hazelcast ICMP Enabled

hazelcast/icmp/enabled

Enables ICMP ping detector for Hazelcast. ICMP pings are used to determine which nodes are still alive.

false

true

7.4.0

Hazelcast ICMP Parallel Mode

hazelcast/icmp/parallel/mode

Sets ICMP detector to parallel mode.

false

true

7.4.0

Hazelcast ICMP Timeout

hazelcast/icmp/timeout

The amount of time to wait before declaring a ping failed, in milliseconds

1000 ms

1000 ms

7.4.0

Hazelcast ICMP Max Attempts

hazelcast/max/attempts

Max ping attempts before suspecting a member

3

3

7.4.0

Hazelcast ICMP Interval

hazelcast/icmp/interval

Time in milliseconds between each ping

1000 ms

1000 ms

7.4.0

Hazlecast ICMP TTL

hazelcast/icmp/ttl

Maximum number of hops for an ICMP packet sent by Hazelcast or 0 for the default

0

0

7.4.0

Hazelcast ICMP Fail Fast

hazelcast/icmp/failfastonstartup

If set, Hazelcast will fail to start if any ICMP requirement is not met.

false

false

Encryption settings

Hazelcast offers features which allow to reach a required privacy on communication level by enabling encryption. Encryption is based on Java Cryptography Architecture (JCA).

The following Encryption settings were introduced in the Pega Platform release indicated.

 

Encryption settings for clustered environments

Introduced in this release

Setting name

Prconfig value

Description

Default value

Example value

7.3.0

Cluster Keystore File Path

cluster/encryption/keystore/path

Location of the keystore for cluster encryption on disk

Not Applicable
(user defined)

/home/cluster-keystore.jks

7.3.0

Cluster Keystore Password

cluster/encryption/keystore/password

An encrypted keystore password

Not Applicable
(user defined)

 

7.3.0

Cluster Truststore File Path

cluster/encryption/truststore/path

Location of the truststore for cluster encryption on disk

Not Applicable
(user defined)

/home/cluster-truststore.jks

7.3.0

Cluster Truststore Password

cluster/encryption/truststore/password

An encrypted truststore password

Not Applicable
(user defined)

 

7.3.0

Cluster Supported Key Manager Algorithm

cluster/encryption/keymanager/algorithm

Key manager algorithm

X509

X509

7.3.0

Cluster Supported Trust Manager

cluster/encryption/trustmanager/algorithm

Trust manager algorithm

X509

X509

7.3.0

Cluster Supported Encryption Protocol

cluster/encryption/protocol

Encryption protocol for the cluster

TLSv1.2

TLSv1.2

See SA-67911.

7.3.0

Cluster Encrypter Custom Class

cluster/encryption/customclass

Name of custom class used to decrypt key/truststore passwords

com.example.mydecryptor

com.example.mydecryptor

7.3.0

Cluster Encryption Keystore

cluster/encryption/keystorename

Name of the keystore file if the file is in the database

cluster-keystore.jks

cluster-keystore.jks

7.3.0

Cluster Encryption Truststore

cluster/encryption/truststorename

Name of the truststore file if the truststore is in the database

cluster-truststore.jks

cluster-truststore.jks

7.3.0

Cluster Encryption Enabled

cluster/encryption/enabled

Enables encryption for the cluster

false

true

7.3.0

Cluster SSL Context Factory Class

cluster/encryption/ssl/factory/class

Class for creating an SSL context

com.hazelcast.examples.MySSLContextFactory

com.hazelcast.examples.MySSLContextFactory

Security settings

Understand Hazelcast Interceptor, which relates to the security settings.

The following Security settings were introduced in the Pega Platform release indicated.

Security settings for clustered environments

Introduced in this release

Setting name

Prconfig value

Description

Default value

Example value

7.4.1 (not in 8.1)

Cluster Socket Interceptor

cluster/network/interceptor/enabled

Allows the interception of network traffic for analysis, for example, to prevent foreign traffic from inundating communication operations

false

false

7.4.1 (not in 8.1)

Cluster Intruder DenyList

cluster/network/interceptor/denylistaddresses

A list of addresses for the cluster to deny communications with

<IP 1>,<IP 2> . . 

<IP 1>,<IP 2> . . 

 

 

 

 

Did you find this content helpful?

100% found this useful


Related Content

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us