Table of Contents

Tools for monitoring system resources in Pega Predictive Diagnostic Cloud

In the System Resources section, you can find statistics and detailed information about the usage and status of your database, Elasticsearch, job schedulers, queue processors, agents, listeners, and your system’s resources. You can use this data to check the health of resources and various backing services, and diagnose past and ongoing issues.

The System Resources section contains the following landing pages:

Database

Check usage statistics and detailed information about the status of your databases.

On the Database landing page, you can find the following types of information:

  • Metrics
    • Database space used (top 5) – The total space used by a database in the selected time frame.
    • Tables with most live rows (top 5) – A list of the five tables with the most live rows.
    • Tables with most usage size (top 5) – A list of the five tables with the greatest usage size.
    • Tables with most indexes (top 5) – A list of the five tables with the most indexes.
    • Tables with most unused indexes (top 5) – A list of five tables with the most unused indexes.

      Indexes that are not used in the last 24 hours are considered unused.

    • Most frequently scanned indexes (top 5) – A list of the five tables with the most scanned indexes.
    • Connections in use – A chart that shows the percentage of active and idle connections.
    • Active queries – A list of the queries that are currently running in the system.
    • Connection details – A list of connections related to client addresses.
  • Tables
    • Rows updated/inserted/deleted – A chart that shows the number of rows that were updated, deleted, or inserted in the selected table in the specified time frame.
    • Sequential scan vs index scan – A chart that shows a comparison between the number of sequential scans and the number of index scans that were performed in the specified time frame.
    • Dead rows – A chart that shows the number of dead rows in the selected table in the specified time frame.
      Dead rows are deleted rows from which data was removed, that are marked for reuse by your datasource when you use a write command, such as INSERT or UPDATE.
    • Live rows – A chart that shows the number of live rows in the selected table in the specified time frame.
      Live rows are rows that are currently in use, from which you can reference and analyze data by using a query.
  • Query Stats
    • Queries – A chart of the total number of queries executed on the selected database in the specified time frame
Exploring the Database landing page
"Exploring the Database landing page"
Exploring the Database landing page

Nodes

Check the usage statistics and detailed information about the status of your nodes.

On the Nodes landing page, you can find the following types of information:

  • The total number of nodes in your system
  • The number of nodes in the following states:
    • Running
    • Offline
    • Unknown
  • For each node in your system, you can check the following details:
    • Node health indicators, for example, the number of requestors and agents, the heap size, processor usage, and the number of urgent events on a node.
    • Node info, for example, a node's name, type, description, total memory, the start time of the system, and the node's PRPC and SOAP URLs.
    • Build info, such as the exact version number and build date of your installation of Pega Platform.
    • JVM info, such as the Java version, and the JVM version.
    • JVM arguments for the JVM.
    • Configuration setting info, which contains the dynamic system settings and settings in the prconfig.xml configuration file.
Exploring the Nodes landing page of PDC
"Exploring the Nodes landing page of PDC"
Exploring the Nodes landing page of PDC

Agents

Check usage statistics and detailed information about the status of your agents.

On the Agents landing page, you can find the following types of information:

  • The total number of agents in your system
  • The number of agents in the following states:
    • Running
    • Stopped
    • Exception
  • For each agent in your system, you can check the following details:
    • Node details – Usage and scheduling data about the agent, sorted by node in your system.
    • Other agent details, such as category and queue class.
    • Historical agent data – An archive of agent data, in which you can view usage and scheduling details from a selected time period, and export this information to an .xlsx file.
Exploring the Agents landing page in PDC
"Exploring the Agents landing page in PDC"
Exploring the Agents landing page in PDC

Resource Utilization

Check usage statistics and detailed information about the status of your system resources.

On the Resource Utilization landing page, you can find the following types of information:

  • CPU utilization by JVM – The average CPU time utilization for each JVM in the specified time frame.
    Every two minutes, the management daemon in Pega Platform sends a HLTH0001 health status message that includes the average CPU utilization, which is calculated in the following way: (CPU seconds used since the JVM started according to the JVM right now — CPU seconds used since the JVM started according to the JVM two minutes ago) / (120 seconds * number of virtual CPUs on the server according to Java). The chart only shows the maximum value for every hour.
  • Heap utilization – The percentage of heap in use in the specified time frame.
    Every two minutes, the management daemon in Pega Platform sends a HLTH0001 health status message that includes the heap utilization. The chart only shows the maximum value for every hour.
  • Active requestors – The number of active requestors in the specified time frame. The chart only shows the maximum value for every hour.
Exploring the Resource Utilization landing page in PDC
"The Resource Utilization landing page"
Exploring the Resource Utilization landing page in PDC

Elasticsearch

Check usage statistics and detailed information about the status of Elasticsearch.

On the Elasticsearch landing page, you can find the following types of information:

  • Metrics summary – Whether search indexing is enabled, and whether it is enabled for all rules, all data, or all work.
  • Default indexes details – The status and size of each type of index.
    You can also find this data from past snapshots by selecting a start and end time, and export historical data to a .csv file.
  • Search index host node details – The node ID, file directory, and status of each search index host node.
Exploring the Elasticsearch landing page in PDC
The Elasticsearch landing page in PDC
Exploring the Elasticsearch landing page in PDC

Job Scheduler

Check usage statistics and detailed information about the status of the job scheduler.

On the Job Scheduler landing page, you can find the following types of information:

  • Metrics summary – The total number of jobs, and the number of enabled and disabled jobs.
  • Job details – Current and historical information about the status of scheduled jobs, such as the average duration, time of the last and next run, success rate and state.
    You can export the data to an .xlsx file.
Exploring the Job Scheduler landing page in PDC
"The Job Scheduler landing page in PDC"
Exploring the Job Scheduler landing page in PDC

Queue Processor

Check usage statistics and detailed information about the status of the queue processor.

On the Queue Processor landing page, you can find the following types of information:

  • Metrics summary – The total number of queue processors, the number of running and disabled queue processors, and the number of broken items.
  • Queue processor details – Current and historical information about the status of the queue processors, such as the class, node type, and statistical data.
    You can export the data to an .xlsx file.
Exploring the Queue Processor landing page in PDC
Exploring the Queue Processor landing page in PDC
Exploring the Queue Processor landing page in PDC

Listeners

Check usage statistics and detailed information about the status of listeners.

On the Listeners landing page, you can find the following types of information:

  • Metrics summary – The total number of listeners, and the number of listeners in the following states:
    • Running
    • Disabled
    • Sleeping
    • Stopped
  • Listener details – Current and historical status and statistical information about the listeners in your system.
    You can export the data to an .xlsx file.
Exploring the Listeners landing page in PDC
"Exploring the Listeners landing page in PDC"
Exploring the Listeners landing page in PDC

Agent Queues

Check usage statistics and detailed information about the status of agent queues.

On the Agent Queues landing page, you can find the following types of information:

  • Metrics summary – The total number of agent queues, and the number of agent queues in the following states:
    • Scheduled
    • Immediate
    • Processing
    • Success
    • Broken
  • Agent queues details – Current and historical status and statistical information about the agent queues in your system.
    You can export the data to an .xlsx file.
Exploring the Agent Queues landing page in PDC
Exploring the Agent Queues landing page in PDC
Exploring the Agent Queues landing page in PDC

JVM Monitoring

Check usage statistics and detailed information about the memory management of your JVM.

You can monitor JVM memory statistics with PDC only for Pega Platform 8.4.3 and 8.5.1 systems.
To enable the feature for your application, you must change the value of the aessetting/M3/disableMonitoring dynamic system setting to false, and then restart all the nodes in the system. For more information, see Configuring dynamic system settings. If you still cannot view statistics on the System Resources > JVM Monitoring landing page, contact Pega Support.

PDC listens for JVM Platform MBeans notifications and analyzes this data to monitor garbage collection (GC) pauses and memory pools usage in real time. As a result, you get a better understanding about the run-time behavior of your application, and you can discover the reasons for performance glitches that are otherwise difficult to diagnose.

By automatically correlating memory usage and GC events with the corresponding Pega alerts and exceptions, PDC helps you to identify the application features that cause the particular memory usage patterns that PDC observes. This monitoring capability helps you to determine when an issue began to manifest and what actions caused the issue so that you can investigate the issue more quickly.

The JVM Monitoring landing page displays the following information:

Tab name Chart description
GC Analysis
  • Application Throughput – The time that the JVM spends running your application.
    Low throughput might mean that your application spends too much time performing GC.
  • Major GC runs – Major GC requires significant time and resources. The fewer the runs that take place in your application, the more efficient your application is.
    The following cause is an example of a major GC trigger that PDC recognizes:
    • Allocation failure – The number of major GC runs triggered by an allocation failure. An allocation failure occurs when a minor garbage collection fails to recover enough memory to allocate live objects from your application.
  • Minor GC runs – Minor GC reclaims memory by reallocating, compacting, and removing objects in the young space. In a healthy system, most of GC is minor GC.
    The following causes are examples of minor GC triggers that PDC recognizes:
    • G1 Evacuation Pause – During the evacuation pause, the garbage-first (G1) garbage collector copies live objects from one or more regions to another.
    • G1 Humongous Allocation – Humongous objects are larger than half of a region size. A higher number of humongous object allocations is detrimental to your application performance.
    • GCLocker Initiated GC – If the region that is critical for running the Java Native Interface (JNI) contains threads, the GC locker blocks GC. After all the threads finish using the JNI critical region, the GC locker initiates GC.

For more information about JVM memory management, see the Oracle Java online documentation.

JVM Memory Pools
  • Heap – Heap memory is part of the memory that the system allocates to the JVM. All running threads of your application share the heap.
    The chart presents how much of the heap the following regions take:
    • G1 Eden Space – The part of the young space where most new objects reside.
    • G1 Old Gen – The old generation is the space to which garbage collectors move objects that survived enough GC runs. This space is also referred to as the tenured space.
    • G1 Survivor Space – The part of the young space that garbage collectors use to organize and compact objects in the memory.
  • Old Gen – The space that contains objects that have survived enough GC runs.
    The chart displays the following metrics:
    • Committed – The amount of memory that the system guarantees as available to use by the JVM.
    • Max – The maximum amount of memory that can be used for memory management.
    • Used – The amount of committed memory that the JVM used.
  • Young – This space contains new and short-lived objects that have not yet survived enough GC runs to be moved to the old generation.
    The chart displays the following metrics:
    • Committed
    • Max
    • Used
  • Metaspace – This space contains the application metadata that the JVM requires to describe the classes and methods used in the application. Metaspace is not part of the heap. Starting from Java 8, the metaspace replaces the PermGen space.
    The chart displays the following metrics:
    • Committed
    • Max
    • Used
  • Code Cache – The code cache has a fixed size. When the cache is full, the JVM cannot compile any additional code because the just-in-time (JIT) compiler cannot function without using the code cache. As a result, the performance of your application degrades.
    The chart displays the following metrics:
    • Committed
    • Max
    • Used
JVM Memory Allocation

The chart displays the following metrics:

  • Allocated – The amount of memory that the JVM allocated for use.
  • Promoted – The amount of objects that GC promoted from the young generation to the old generation.
  • Reclaimed – The amount of memory that GC reclaimed by evacuating objects from one memory region to another, and then compacting the memory.
Monitoring the JVM memory management with PDC
"Exploring the JVM Monitoring landing page in PDC"
Monitoring the JVM memory management with PDC
Suggest Edit

100% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.