Support Article
AES nodes go from Online to Unknown after a period of time
SA-44315
Summary
On Autonomic Event Services (AES) 7.2 Enterprise Health screen, some Pega nodes' Run state changes from 'Running' to 'Unknown' automatically, even though the Pega node is up and running without any issue.
Error Messages
Not Applicable
Steps to Reproduce
Not Applicable
Root Cause
A defect or configuration issue in the operating environment. This seems to be occurring because the ManagementDaemon has stopped running on the monitored node, which is what drives sending AES Health messages.
Lacking regular Health messages, AES changes the node state to Unknown. ManagementDaemon which is an Async process and spins off batch requestors, uses JMS Topic for execution.
On the Weblogic application server where pega is deployed, the JMS messages were getting expired because there was a delay in consuming the messages by the JMS Listener / MDB.
The “Default Time-To-Live” for the PRAsync TopicConnection Factory was set to 10 milliseconds.
Resolution
Make the following change to the operating environment:
Increase the “Default Time-To-Live” value for the TopicConnectionFactory to a higher value of 30 seconds (30000).
Published November 4, 2017 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.