AES nodes go from Online to Unknown after a period of time
On Autonomic Event Services (AES) 7.2 Enterprise Health screen, some Pega nodes' Run state changes from 'Running' to 'Unknown' automatically, even though the Pega node is up and running without any issue.
Steps to Reproduce
A defect or configuration issue in the operating environment. This seems to be occurring because the ManagementDaemon has stopped running on the monitored node, which is what drives sending AES Health messages.
Lacking regular Health messages, AES changes the node state to Unknown. ManagementDaemon which is an Async process and spins off batch requestors, uses JMS Topic for execution.
On the Weblogic application server where pega is deployed, the JMS messages were getting expired because there was a delay in consuming the messages by the JMS Listener / MDB.
The “Default Time-To-Live” for the PRAsync TopicConnection Factory was set to 10 milliseconds.
Make the following change to the operating environment:
Increase the “Default Time-To-Live” value for the TopicConnectionFactory to a higher value of 30 seconds (30000).
100% found this useful