Introduction to Pega Autonomic Event Services
Pega® Autonomic Event Services is an independent, self-contained system that gathers, monitors, and analyzes performance and health indicators from multiple Pega BPM systems across the enterprise. Pega Autonomic Event Services combines server-level and BPM-level enterprise monitoring in a single web-based tool.
Not only a monitoring console, Pega Autonomic Event Services is an intelligent agent that can predict and notify administrators when system performance or business logic problems occur. Pega Autonomic Event Services provides suggestions and administration tools to correct them.
- Designed for rapid deployment
- Alerts and exceptions summary notifications
- Putting alerts and exceptions into workflows
- Sharing Pega Autonomic Event Services information
- Weekly Scoreboard reports
- Monitoring enterprise health
- Why use Pega Autonomic Event Services?
- A technical overview
Pega Autonomic Event Services can be quickly deployed on any Pega BPM enterprise configuration. The installation and configuration package contains all of the necessary files needed to set up the Pega Autonomic Event Services server and configure the nodes for monitoring. The processes are straightforward and do not require deep technical expertise.
Pega Autonomic Event Services gathers and organizes Alert log messages and Pega log exceptions into comprehensive summary notifications that parse the log data into a format that is easy to read and use as a diagnostic tool.
- Alerts identify individual system events that exceed performance thresholds or failures. Messages are triggered when system events degrade performance or compromise Web node security. These include events such as excessive browser interaction time, a requestor running too long, excessive data retrieved from the database, or excessive time executing a database query.
- Exceptions indicate abnormal processing behavior, which are not triggered as alerts. A properly operating system should not show any exceptions. Exceptions contain messages (with stack trace statements) created by your activities as well as by standard rules.
A wide range of data is associated with each alert and exception notification. The information includes a session description, a stack frame list, clipboard data, a list of other alerts that occurred during the same requestor interaction, and more. Performance (PAL) statistics supplement the summaries giving additional insight into the overall performance impact. Here is an example alert notification:
As recurring patterns develop among key alerts and exceptions, Pega Autonomic Event Services aggregates them and their associated data into work objects (action items or exception items). These appear in the AES Manager portal for use in Pega Autonomic Event Services work flows.
This capability enables you to assign resources for information gathering, diagnosis, and remediation. Hovering over an item gives you a snapshot summary of the alert and prioritizes it (via the urgency column) as shown in this example.
Drilling down into an item displays the item's work form, which includes the alert's (or exception's) frequency and total occurrences during a given time span. Most importantly, Pega Autonomic Event Services diagnoses the degree of benefit derived by remediating the item, describes the issues that contributed to causing the item, and suggests what you should do to fix each issue.
For example, Pega Autonomic Event Services creates a Memory Utilization (MU) action item when a PEGA0028 alert occurs. The event indicates that JVM garbage collection processing did not reclaim enough memory to remedy the performance impact. Here is an example:
To access information that is relevant to your investigation, the work form contains additional data and drill-down capabilities.
Because troubleshooting memory issues can be complex, Pega Autonomic Event Services tracks alerts that are likely correlated to the MU action item as shown below:
A time-series plot of the data shows the temporal relationships among the alerts.
To facilitate enterprise-wide communication and team coordination during system optimization, you can set up email subscriptions for any user in the enterprise that automatically send notifications of current system events. These include the creation of new alerts, exceptions, and work items, or changes to a cluster's health status. The subscriptions can also issue daily or weekly Pega Autonomic Event Services scorecard reports, as well as the Top Offenders scorecard.
Pega Autonomic Event Services provides two scoreboards to help quickly identify the top performance issues, as well as summarizing weekly performance statistics and comparing them to the previous week.
The Top Offenders report highlights the five action items that are contributing the most time to overall browser time. By focusing on the top issues, users can ensure that they spend their efforts on the action items that will make the most difference in the performance of the system.
The scorecard also shows the current status of these action items, how long they have existed, and whether the problem has gotten better or worse in the past week. This allows managers to quickly track the progress towards fixing the top issues in a particular system. Here is an example of a Top Offenders scorecard.
The weekly scorecard shows daily performance statistics as well as the highest urgency action and exception items. Here is an example of a weekly scorecard.
The Pega Autonomic Event Services Enterprise Health console provides up-to-the-minute enterprise, cluster, and node level monitoring. The console tracks these key statistics and events:
- Number of active requestors
- Number of agents running
- Percentage of JVM memory being used
- Last time of system pulse
- Process CPU usage
- Number of database connections
- SQL exceptions
- Average HTTP (browser or portal requestor) response time
- Rule cache enabled
- Alerts and exceptions the require immediate attention
The console associates each metric with a color-coded indicator signifying a normal (green), warning (yellow), or critical (red) condition. An indicator changes to yellow or red when a reading exceeds a specified threshold value. Easy to spot on the console, these indicators tell you to investigate and resolve the trouble spots before they worsen into chronic performance issues.
Here is an example of a node's health information. Agents are in critical condition as defined by its threshold. You can modify the values to suit your custom requirements.
To aid your research, Pega Autonomic Event Services correlates Critical indicators to alerts reports or to current console information.
From one console, you can manage requestors, agents, and listeners across your enterprise. As shown here on the Requestors form, you can stop, interrupt, get clipboard size, and so on for each requestor on every node.
You can use the console to drill down and display active graphs of system run-time behavior. This example shows the amount of memory used by a node:
Additionally, you can view charts and reports that describe node or cluster activity over a specified duration. Here is a chart showing the number of each type of user over a number of days.
Pega Autonomic Event Services is a key tool for use in both development and production environments.
In a development environment
Pega Autonomic Event Services is useful when building enterprise-level applications and discovering issues of scale and load that can significantly affect processing performance in a production environment. Pega Autonomic Event Services can accelerate process optimization by identifying potential performance issues early in the development and testing phases, as well as giving users suggestions for how to fix the issues.
For example, a work object may have a list loaded from the database for the user to review. In a test system, this list may be small. However, when that system goes into production, there will be many more items on the list and more users loading the list at the same time, which will place greater stress on system performance. Using Pega Autonomic Event Services, you can identify trouble spots when they are small and before they cause a larger impact on application performance.
In a production environment
Pega Autonomic Event Services flags issues that may arise due to increased workload or reconfigured applications.
For example, you may want to run some processes during non-work periods, which formerly had been spread out during longer work periods. As a result, the concentration of server/database interactions may trigger alerts that reveal the need for better load balancing or hardware upgrades. As another example, you may reconfigure a flow to automate a manual flow action, thus creating new demands on server/database interactions.
A technical overview
Pega Autonomic Event Services is installed on a standalone Pega Platform server that monitors the performance status of one or more Pega Platform nodes and clusters in an enterprise deployment. Here is a high-level illustration of the Pega Autonomic Event Services architecture.
Real-time system data, such as the number of active requestors or CPU usage, are sent by a SOAP service from the monitored Pega Platform node to the Pega Autonomic Event Services system server. Pega Autonomic Event Services parses these messages and stores the records in pegaam_node_health and pegaam_node_stats tables in the Pega Autonomic Event Services database. These records are used to generate the data used in the Enterprise Health console. These messages also contain the information necessary to create the cluster and node records for the Pega Autonomic Event Services system.
When a Pega Platform node generates alerts, they are written to the node's Alert log and sent by SOAP to the Pega Autonomic Event Services server. It parses the alerts and stores the records in the pegaam_alert table in the Pega Autonomic Event Services database. Based upon how often an alert occurs and the system events that triggered those alerts, Pega Autonomic Event Services aggregates these records into work objects called Pega Autonomic Event Services action items. These items are written to the Pega Autonomic Event Services database in the pegaam_action_work table as one of several action-item types.
Exception processing is similar. When a Pega Platform node generates exceptions, they are sent by SOAP to the Pega Autonomic Event Services server. It parses the exceptions and stores the records in the pegaam_exception table in the Pega Autonomic Event Services database. Based upon how often an exception occurs and the system events that triggered those exceptions, Pega Autonomic Event Services aggregates these records into work objects called Pega Autonomic Event Services exception items. These items are written to the Pega Autonomic Event Services database in the pegaam_exception_work table.
Pega Autonomic Event Services queries daily the monitored nodes for Log-Usage data by way of SOAP. This data, which includes PAL statistics, is used to update requestor-related reports and graphs. These statistics enable you to view and analyze the data in many contexts and help you target and solve performance issues that may not be captured in the alert or action item data.