Assessing the consistency of your system's performance with Pega Predictive Diagnostic Cloud
By understanding how your system runs every day, you can quickly detect when it behaves differently by using Pega Predictive Diagnostic Cloud™ (PDC).
uPlusTelco runs a production system with Pega Platform™ applications installed. As an operations manager, you are responsible for monitoring the health of this system and ensuring that your team is aware of any anomalies in the system's performance. uPlusTelco system administrators have determined that the user experience is acceptable if less than 1% of server interactions take longer than one second.
Before you begin
Ensure that you can access PDC. For more information, see Getting started with Pega Predictive Diagnostic Cloud.
Assessing the consistency of your system's performance
Use PDC to perform a daily checkup of your system's performance:
- Log in to PDC.
- In the header of PDC, in the System: list, select the system that you want to monitor, for example, upt-prod1.
- In the navigation pane, click System Assessment.
- In the Requestor type drop-down list, select Web.
Web requestors are browser sessions, and their slowness directly affects the user experience. A web requestor interaction represents a user interacting with the application through their web browser, for example, by clicking a button.
- In the Interval list, select Last 7 days.
- On the Slow interactions are interactions with the server that exceeded the threshold configured in your application. The default threshold is one second.By analyzing this chart, you can quickly check whether the performance has been consistent throughout the selected time frame. In the following example, on December 1, nearly a quarter of all interactions were slow. The reasons for your system's performance on that day might require further investigation.
chart, for each day, compare the percentage of healthy interactions (green) with the percentage of slow interactions (red).
- In the Average time of healthy and slow interactions chart, for each day, compare the average time that healthy interactions took (green bar) with the average time that slow interactions took (red bar).
In the following example, on December 1, the average time that an interaction took was significantly higher for both healthy and slow interactions. This result overlaps with the findings in Step 6, and is another reason why your system's performance on that day might require further investigation.
- In the upper right corner of the System Assessment landing page, click the Click to view data in table format button.
The table view contains additional information that you can use to evaluate the system's performance on a given day, such as the number of times that users interacted with the application, the number of alerts that the system generated, and the overall average response time of the server.
- Compare the following values on each day:
- Number of interactions and number of alerts
- On weekends and holidays, uPlusTelco expects a significantly lower volume of interactions and alerts, because uPlusTelco is closed for business on these days. However, during a typical working week, significant differences in the numbers of interactions and alerts might be a symptom of operational issues in your system. To find out whether these differences coincide with significant increases or decreases in the number of unique users, go to Step 10.
- A significant increase in the volume of interactions puts an additional load on your system, and often results in slower interaction times. In the example above, on December 1, the average response time spiked, although the numbers of interactions and alerts were similar to the other days of the week. These circumstances also indicate possible operational issues.
- Average response time
- An unusually high average response time on any particular day might indicate an issue with your system, for example, poor database performance.
- Number of interactions and number of alerts
- In the Requestor type drop-down list, select Service, and then repeat Steps 5 through 9.
Service requestors are sessions for listeners and for access to Pega Platform from an external client system, such as through a service request. Slow responses to these requestors directly affect the user experience, because rendering a screen for a user typically involves multiple server interactions.
- If you have multiple applications, in the Application drop-down list, select an application, for example, uPlus_CallCenter, and then repeat Steps 4 through 10.
In a system with multiple applications that have distinctive user communities, the overall performance of your system might not reflect the behavior of the individual applications. By analyzing the data for each application separately, you can detect issues with a single application.
- In the Top Items section, analyze the values in the Occurrences column.
For each of the top cases, check whether the occurrences have increased or decreased in the last 24 hours. In the following example, the number of events that PDC tracks in the BrowsInt-4857 case has remained steady, which means that the problem that this case references has not become worse. However, the problem has cost your system more than two additional hours of processing, and so the case should be investigated. As the occurrences of BrowsInt-2648 have almost doubled, this case also might require urgent attention.
- In the navigation pane, click Usage Metrics > System.
On this landing page, you can quickly compare the numbers of unique users on each of your system's nodes.
- Turn on the Split by node switch, and then search for anomalies.
In the following example, on November 27 and 28, the number of unique users on each of the nodes was several times lower than on other days. Because these days are Saturday and Sunday, a significantly lower volume system load is expected. However, a similar decrease on a normal working day might indicate, for example, that users could not access your system on that day.
- If you have multiple applications, in the Application list, select an application, for example, uPlus_CallCenter, and then repeat Step 14.
You assessed the daily performance of your system and its main applications for the last week. You checked whether your system's performance has been consistent, improving, or degrading, and identified unusual behavior that requires further investigation.
What to do next
Identify the most urgent performance problems in your system. For more information, see Issue identification and research with Pega Predictive Diagnostic Cloud.