Support Article
Multiple nodes shown as unavailable in AES Health Dashboard
SA-6665
Summary
User reports that multiple nodes were shown as unavailable after AES 7.1 upgrade even after rebooting the servers.
Error Messages
Not Applicable
Steps to Reproduce
1. Monitor the Health Dashboard in AES.
2. Observe that a few nodes are shown as Unavailable in the Health Dashboard.
Root Cause
During investigations, the following observations were made:
- All systems were configured to send health status messages by the appender “Alert-AES-SOAP”. The “Alert-AES-SOAP” appender is also used to send performance alerts to AES. As configured, ‘alert’ messages were contending with health status messages.
- On the production nodes that were having issues sending health status messages, the management daemon was blocked trying to access to the Alert SOAP appender, and the SOAP appender itself was stuck waiting on an HTTP response header from an alert message. By default, the SOAP appender code waits indefinitely for HTTP/SOAP response after sending a message to AES.
- All systems (AES and monitored nodes) were running with the default option to do a reverse DNS lookup on every incoming SOAP message and new browser session. Reverse DNS lookup is used to associate a browser with a workstation host name, and is implemented by making a network call to find the host name associated with the HTTP request source IP address.
While reverse DNS lookup is a Pega 7 Platform default, experience has shown that it is not best practice to use it in a production environment for the following reasons:- Reverse DNS is of no value when PRPC is behind a load balancer because the load balancer itself is the ‘source IP address’ of the requests that hit PRPC. The reverse DNS lookup merely tags every browser session with the load balancer host name.
- Reverse DNS has been known to act quirky and can cause unexpected latency and delay in request processing.
- In logging configuration file prlogging.xml there were entries for a Health Status SOAP appender, separate from the Alert SOAP appender but the Health Status SOAP appender was not in use.
- There was an error in the how appenders were ‘chained’ in prlogging.xml. HEALTH-AES-SOAP was ‘chained’ from HEALTH-ASYNC. ‘HEALTH-ASYC’ appender was defined with a denyAll filter, which would serve to block and break any messages from the appender.
- The management daemon was explicitly configured to continue using the ALERT-AES-SOAP appender for health status messages. Dynamic system setting Pega-Engine prconfig/management/appender/default was set to ALERT-AES-SOAP
Resolution
Four changes were applied to address these issues:
- A new dynamic system setting was added to turn off reverse DNS lookup.
Dynamic System Setting Pega-Engine prconfig/http/reversednslookup/default was added with a value of ‘false’ - In logging configuration file ‘prlogging.xml’ add a ‘timeout’ parameter to all appenders implemented by SOAPAppenderPega. The TimeOut parameter prevents the SOAPAppender from indefinitely hanging in the event of a communications or protocol error. The timeout parameter takes a value of 1/10000 seconds.
- In logging configuration file prlogging.xml, create a new Asynchronous appender AES-HEALTH-MSG to act as an intermediary between the top-level HEALTH-ASYNC appender (which is intentionally crippled via a denyAll filter) and the actual HEALTH-AES-SOAP appender. From AES-HEALTH-MSG, chain to HEALTH-AES-SOAP.
- Update dynamic system setting Pega-Engine prconfig/management/appender/default to reference AES-HEALTH-MSG instead of ALERT-AES-SOAP.
This Support Article pertains to a case reported for Autonomic Event Services (AES) EE 7.1.
The configuration settings referred to above were deprecated with Pega 7.1.7.
Starting with Pega 7.1.7, AES and PDC system settings are provided in the Pega 7 Platform.
You can update the default AES and PDC settings from the Designer Studio landing page:
Click System > Settings > Predictive Diagnostic Cloud.
Related Content
System - Predicitive Diagnostic Cloud
Troubleshooting AES 7.1.7 and AES 7.2 connectivity, performance, and reporting problems
TechTalk Episode 22: Configuration Issues and Symptoms for Autonomic Event Services
The configuration settings referred to above were deprecated with Pega 7.1.7.
Starting with Pega 7.1.7, AES and PDC system settings are provided in the Pega 7 Platform.
You can update the default AES and PDC settings from the Designer Studio landing page:
Click System > Settings > Predictive Diagnostic Cloud.
Related Content
System - Predicitive Diagnostic Cloud
Troubleshooting AES 7.1.7 and AES 7.2 connectivity, performance, and reporting problems
TechTalk Episode 22: Configuration Issues and Symptoms for Autonomic Event Services
Published March 24, 2017 - Updated December 2, 2021
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.