Troubleshooting AES 7.1.7 and AES 7.2 connectivity, performance, and reporting problems
Learn how to troubleshoot Autonomic Event Services, AES 7.1.7 and AES 7.2, connectivity, performance, and reporting problems.
Use the checklist provided in this article to prevent configuration problems.
If you have an AES configuration problem that requires you to submit a Support Request (SR), use the must-gather list provided in this article to collect the artifacts that GCS needs to resolve your SR.
When the AES Enterprise Health Monitor does not display the nodes or clusters being monitored, any one or more of the following conditions might cause this symptom:
- Operator access fails: The operator does not have access to the AES nodes or clusters being monitored.
- No health messages are being sent to the AES server.
- The endpoint specified in the monitored nodes for the Predictive Diagnostic Cloud (PDC) URL is incorrect.
- SOAP messages sent from the monitored nodes do not reach the AES server.
Frequently, the Enterprise Health Monitor in the AES Manager portal does not display any clusters or nodes. This can occur when the operator ID in use has not been given access to the cluster or node.
To grant the operator access to the cluster or node, follow these steps:
- From the AES
AES Tools menu Manage Operator Systems
menu, click .
- In the Manage Operators screen, Select Operator, select the name of the operator from the list:
Manager Operators to Select Operators
- In the Manage Operators screen, Select Systems, select the name of the system for the operator you specified in Step 2.
Manager Operators to Select System for the specified operator
- To verify the results of the previous steps, go to the AES Enterprise Administration Tasks, the Management view, and click Display Operators By System.
AES Enterprise Administration Tasks, Display Operators By System
- Click the name of the system that you specified in Step 3 to verify that operator who you specified in Step 2 has access to the system of the monitored nodes or clusters.
List of operators with access to the selected system
When no health messages are being sent to the AES server, either one or both of the following conditions might be the cause of this problem:
- Legacy PRPC 6.x information for sending health messages to the AES server in the prlogging.xml and prconfig.xml files causes inconsistency.
- The PegaAESREMOTE agent is not running on the monitored node.
Make sure that the prconfig.xml and prlogging.xml files specify information that is consistent with monitored node. PRPC 6.x releases have no dynamic appenders. AES 7.x will monitor PRPC 6.x nodes. Be sure to remove PRPC 6.x information because a guardrail report is not provided for this issue.
When the prconfig.xml file specifies legacy information used by earlier AES versions, remove that legacy information from both the prconfig.xml file and the prlogging.xml file.
This example image illustrates legacy information from earlier AES versions that is still specified in the prconfig.xml and prlogging.xml files that you need to remove.
Example prconfig.xml file with legacy AES information
When the PegaAESRemote agent is not running on the monitored node, connect to the System Management Console (SMC) for the monitored node. Make sure that all of the PegaAESRemote agents are running as shown in the following example.
System Management Console showing PegaAESRemote agents running
Problem: The endpoint specified on the monitored nodes for the Predictive Diagnostic Cloud (PDC) URL is incorrect
If you suspect that the PDC URL endpoint for the monitored nodes is incorrect, check the PDC system settings from the Designer Studio landing page.
The current way to connect to the AES server is to use the dynamically built-in appenders that are generated in all Pega 7 Platform systems. The PDC URL is sufficient to make the proper connections for Health, Exception, and Alert data.
- From the Designer Studio landing page, click .
- On the System: Predictive Diagnostic Cloud Configuration screen, type the correct URL in the Endpoint SOAP URL field.
System Settings for Predictive Diagnostic Cloud Endpoint SOAP URL
When SOAP messages sent from monitored nodes do not reach the AES server, some infrastructure problem in your deployment is probably the cause.
See also https://pdn.pega.com/documents/aes-node-configuration-guide-72, the section Configuring the AES Nodes (Versions 7.1.7+).
Make sure that the monitored node is providing the correct information to the AES server’s application server so that authentication of the SOAP service can take place. This is done by using the AES Enterprise Administration Tasks, Manage SOAP authentication. If you see HTTP 401 errors in the log, you might need to use the AESRemoteUser Authentication Profile depending on your AES version and patch level.
See SA-13972, https://pdn.pega.com/support-articles/aes-manager-portal-issues-soap-authentication
See SA-12131, https://pdn.pega.com/support-articles/aes-throwing-401-errors-while-communicating-monitored-nodes-0
Make sure that if you are using SSL that the appropriate security certificates are installed.
See SA-25004, https://pdn.pega.com/support-articles/aes-not-able-monitor-nodes-running-jboss
Frequently no data is returned for agents, requestors, and other node elements when there is a problem communicating to the monitored node. This can happen when one or more of the following conditions exist. This is not an exhaustive list:
- The node URL was not automatically discovered on startup.
- The application server requires Secure Sockets Layer (SSL) communication protocol.
- The application server requires authentication for incoming traffic.
When a new connection string cannot be determined by the system for an enabled node, you need to specify the URL.
The following image shows an example in which the Node Information New Connection String value is [unable to determine].
Edit Nodes information, New Connection String unable to determine
Edit the Node Information field New Connection String to specify the correct URL as shown in the following image.
Edit Nodes information, New Connection String specified with your node URL
When the application server requires SSL, the application server log should indicate that there is a handshake error.
If you are using SSL, make sure that that the appropriate security certificates are installed.
See SA-25004, https://pdn.pega.com/support-articles/aes-not-able-monitor-nodes-running-jboss
If it is not clear why the handshake error is occurring you can use the following JVM argument:
This can also be specified in several different ways within your application server. Check with your infrastructure team regarding the certificates and configuring the diagnostic.
Make sure that the monitored node is providing the correct information to the AES server’s application server so that authentication of the SOAP service can take place. Do this by using. If you see HTTP 401 errors in the log, you might need to use the AESRemoteUser Authentication Profile depending on your AES version and patch level.
Refer to the following Help topic and Support Articles:
Data not pushed correctly from monitored nodes can be related to Symptom 1 Problem: No health messages are being sent to the AES server. See the solutions for that problem.
This symptom can also be related to Symptom 6: The protocol changed to HTTPS and now some data is missing. See the solution for that symptom.
If these problems are not the root cause, the DSS for the monitored node might not be set to 'push'.
In addition to trying the solutions for Symptom 1 and Symptom 6, make sure that the proper DSS values are set on the monitored node to make use of the ‘push’.
Owning ruleset PegaAESRemote with setting purpose aessetting/perfstatmode has value PUSH
The most frequent cause of AES nightly tasks not removing expired data is that the AES agents are not running or have not run in the past.
Review the rest of this article to verify that you have set up AES Agent Management correctly.
If you are monitoring a significant number of nodes, depending on your operating environment, it is possible that even with the AES Agents running successfully, the system still has too many alerts, exceptions, and related work items in play for generating reports and email subscriptions in a timely manner.
A lengthy retention period for alerts, exceptions, and related work can be the root cause of excessive exceptions and alerts in database tables.
To resolve excessive exceptions and alerts in database tables, reduce the retention period in the AES Settings:
- From AES Enterprise Administration Tasks, Management screen, click System Settings.
AES Enterprise Administration Tasks, Management, System Settings
- In System Settings, for each Data Type listed, reduce the number of Days specified for the retention period.
AES System Settings, Data Types and number of Days specified for retention
- Modify the following pseudo SQL to see if data is being correctly trimmed in the alert and exception table. Modify the pseudo SQL as required by your database management system.
Select count(*) from <data-schema>.pegaam_alerts where pxcreatedatetime < ‘fourteen-days-ago’;
Select count(*) from <data-schema>.pegaam_exceptions where pxcreatedatetime < ‘fourteen-days-ago’;
- If these statements return a value much greater than zero (0), you need to delete the data manually.
- If you wish to preserve the data, then you should check the command timeout settings in your application server for the PegaRULES data source. Consider increasing or shutting off that timeout.
- Another local change would be to partition the data by date and use database tools to remove that data from the exception or alert tables.
When the communications protocol changes to Secured Sockets Layer (SSL), the communicating systems must have the appropriate certificates installed.
Because SSL certificate management is outside of the Pega Platform, work with your infrastructure team to make sure that the certificates are installed correctly. The use of these certificates is handled by the application server.
Refer to this PDN Article, https://pdn.pega.com/how-configure-application-server-support-ssltls-prpc.
Also try this good diagnostic, applied to the JVM arguments:
If you have provided the correct URL to the AES server as a load balancer or web plugin URL and you are monitoring many nodes ‘behind’ that IP address, then you have not provided enough information for the AES server to gather specific information from the requested node. Therefore you are getting the details from whichever server is used in accordance with the load balancer algorithm. In this case, specific information can be ‘pushed’ from the monitored node only. The AES server is not able to directly access the monitored node to make queries regarding the requestors or agents.
These capabilities are not supported in the AES manager portal.
Prior to AES 7.1.7, you were most likely to run agents on the AES server using a dispersion approach for clustered deployments because you segmented the AES agents among the nodes of the cluster. With AES 7.1.7 and later releases, this approach is no longer needed and should not be used.
In AES 7.1.7 and later releases, the agents that run on the AES server are designed to make sure that the code runs only on one node at a time. This is controlled by the Dynamic System Setting (DSS) AES/SECURITY/AGENTS/NODEID.
- AES Agents for a clustered environment run on a single node, AESAgentsNode.
- When the node on which AES agents are running stops, some other active node is detected and used for running the AES agents.
Here are the highlights of this new feature:
- Supports running the agents on all nodes for AES 7.1.7 and AES 7.2
- Runs agents on one node only
- Has some fail-over capability
Do not use the old method of segmenting and dispersing the AES agents!
The following section provides a preview demonstration of how this new DSS AES/SECURITY/AGENTS/NODEID works. This information might be useful if your AES server does not seem to be performing its data housekeeping tasks effectively.
AES 7.1.7 and later releases provide the DSS AES/SECURITY/AGENTS/NODEID and the data page D_AESAgentsNode, which expires every 30 minutes. All agent activities check to see that they are on the right node. The load activity checks System-Status-Nodes to be sure that the owning node has "checked in" and is running system pulse. Then it takes over as the AES agent node if the owning node is no longer running.
The following image shows the DSS AES/SECURITY/AGENTS/NODEID that defines which node is to run the AES server agents.
Edit Dynamic System Settings for PegaAES Security Agents Node ID
The Value of the current AES Agents Node ID is a0111ea85a9f87288a89390598507e1a.
On this agents node, a0111ea85a9f87288a89390598507e1a, the data page D_AESAgentsNode.isAESAgentsNode is set to true.
Data Page D_AESAgentsNode.isAESAgentsNode set to true
The node level Data Page D_AESAgentsNode has a refresh strategy that inspects the DSS AES/SECURITY/AGENTS/NODEID and the pr_sys_statusnodes table to determine if the designated node is indeed still running.
If the designated agents node (a0111ea85a9f87288a89390598507e1a) is no longer running, the system finds an active node to replace it in the DSS.
Here, on another agents node, 5748a744c98ce4a5d02b843803adb6f6, we see that IsAesAgentsNode is set to false.
Data Page D_AESAgentsNode.isAESAgentsNode set to false
Each of the agents’ activities check D_AESAgentsNode.isAESAgentsNode as shown in the following image:
D_AESAgentsNode list of agents’ Activities
Clicking an Activity in the D_AESAgentsNode list opens the Steps of the Activity as shown in the following image:
Example Activity form opened from the D_AESAgentsNode list
Result: All nodes run the agents, but only the AESAgentsNode actually does the work.
This is governed by the DSS, AES/SECURITY/AGENTS/NODEID.
Edit Dynamic System Settings for PegaAES Security Agents Node ID
Here is how the refresh strategy works for the DSS AES/SECURITY/AGENTS/NODEID.
If the node specified in the DSS has a last system DB Cache pulse older than 30 minutes or the current state of the node is not ‘Running’, then another node becomes the AESAgentsNode. See the System-Status-Nodes.SystemNodesDetail All Report, (as determined by the pr_sys_statusnodes table) for node status information.
In the following example you can see that the Pega126.96.36.199 AES Server node is stopped.
AES server list shows stopped server
For PYSYSNODEID A0111EA85A9F87288A89390598507E1A, the PYRUNSTATE is now ‘Stopped’.
PYSYSNODEID and PYRUNSTATE showing servers running and stopped
The remaining node, the one that is Running, now takes over the responsibility for running the AES Server agents. You can see the new node ID in the DSS AES/SECURITY/AGENTS/NODEID,
DSS AESAgentsNODEID shows new running node
Now this node runs the agents until it stops and causes the DSS to refresh with a new, running AES agents node.
After you set up your AES server, make sure that you have met the following conditions:
- The agents for PegaAESRemote and PegaAES are running.
- The transport layer (SSL/TLS) allows free communication.
- The node definitions specify the correct URLs.
- The AES operator has access to the systems where nodes and clusters are being monitored.
- Authentication is working between the AES server and the monitored nodes and clusters.
There are many diagnostic tools that you can use. Here is a short list:
On the monitored node
- Logger class com.pega.pegarules.priv.util.SOAPAppenderPega
- Trace the Services with package PegaAESRemote
On the AES server
- Trace Service Soap PegaAES • Events • LogAlert
- Trace Service Soap PegaAES • Events • LogException
On both the monitored node and the AES server
- JVM arg javax.net.debug=ssl:handshake
If you have followed the troubleshooting guidance provided in this article and still experience problems with your AES 7.1.7 or AES 7.2 configuration, post your issue to the Pega Product Support Community. There, Global Customer Support (GCS) experts in AES can help you resolve your issue or determine whether you need to submit a Support Request (SR).
If the GCS engineers responding to you in the Pega Product Support Community determine that you need to submit an SR for your AES configuration problem, collect the following artifacts before you create your SR. You need to attach these artifacts to the SR before you submit it.
- AES version information
- AES system settings
- All Pega Log files
- All Application Server log files
- Screen shots that illustrate the problem
- Application hotfixes imported for AES