PEGA0125 alert: Service registry heartbeat failed
The service registry is a mechanism for discovery and coordination of distributed components in a Pega cluster. Every 30 seconds, each Pega node sends a heartbeat to the service registry to indicate that the node is still running and available. The heartbeat updates the
last seen time parameter of the node in the service registry database table. If the heartbeat fails or takes a long time to complete, the node might drop out of the cluster.
For more information about the service registry, see About service registry.
Reasons for the alert
Two types of issues might cause this alert:
- The service registry fails to update the heartbeat because the database is unavailable or unreachable. The alert contains the exception that can help you to investigate the root cause.
- It takes more than 30 seconds to update the heartbeat because the database is slow to respond or another internal process blocks the heartbeat. The service registry generates a thread dump in the PegaRULES.log file that you can use to identify the root cause.
Service registry heartbeat has failed
Logs to recognize the alert
In the PegaRULES.log file, the logs relevant to the PEGA0125 alert contain the following entries:
- When the service registry heartbeat fails due to an exception:
Could not update last seen time for sessions
- When the service registry heartbeat takes too long to complete:
Current heartbeat is taking longer than ms. Creating a thread dump...
Recurring PEGA0125 alerts indicate a severe issue in your environment that might impair the health of your Pega cluster. Identify and address the root cause behind the heartbeat failure:
- Read the alert message and note the exceptions that caused the heartbeat failure.
- Review the service registry heartbeat threads and any other related threads in the PegaRULES.log file.
For more information, see Log files tool.
- Resolve the underlying issue.