Skip to main content
LinkedIn
Copied!

Table of Contents

Pega Ping service FAQs

Pega Ping is a service that verifies the health of a web node, that is, -DNodeType=WebUser.

Load balancers can be configured to use this ping service to check the node's health. F5 Big IP can use the URL in creating a health monitor, and AWS Elastic Load Balancer can reference the URL in its Health Check.

In releases prior to Pega 8.2, Pega Ping is a REST service that performs health checks synchronously (on demand) and returns the status when a request comes.

In Pega releases prior to Pega 8.2, checking the health of a Pega node rapidly and reliably is hindered by the following limitations:

  • Health Ping service times out even though the node is healthy. This causes the nodes to restart multiple times and destabilizes the entire cluster.
  • Health Ping service does not report unhealthy behaviors such as Out of Memory (OOM). OOM might still be raised in third- party code or code that is not handled by Pega Ping node health monitoring.
  • Health Ping service checks the health of the web node processing only, that is, -DNodeType=WebUser. It does not consider node types for BackgroundProcessing, Stream (including DSM), BIX, Search, and Universal.
  • Remote tracing of REST services interferes with ping service execution times.   

In Pega 8.2 and later releases, the Pega Ping service is improved to run health checks asynchronously and periodically.

To benefit from improvements to the Pega Ping service, upgrade to the latest Pega Platform release, at minimum Pega 8.2.

See Keeping current with Pega.

Frequently asked questions

Node health monitoring features provided by Pega 8.2 and later releases

Frequently asked questions

How can I verify the health of a Pega node?

What does a typical Pega Ping response look like?

How do I get active browser requestor count?

What is the activity that is run by the Pega Ping service?

My node is reporting 'unhealthy'. What do I need to do?

My ping service is returning 500 status (unhealthy) but reviewing the ping JSON or Pega-Rules logs does not help me. Whom do I contact? 

My ping service health check displays N_CriticalErrorNotification. What does this mean? What do I need to do?

Do the node health checks catch all Out of Memory errors?

What are the typical issues with the Pega Ping service in releases prior to Pega 8.2?

How can I verify the health of a Pega node?

Address your browser to this URL:

http://<<hostName:port/contextName>>/PRRestService/monitor/pingservice/ping

If the node is healthy, the URL returns a response code of 200.

If the node is unhealthy, the URL returns a response code of 500. See I see status code 500: What additional artifacts do I need to collect? I see status codes indicating a problem, but I do not see any error in the PegaRules log file. Why?

What does a typical Pega Ping response look like?

Pega 8.2 and later releases

In Pega 8.2 and later releases, the Pega Ping response looks like this example:

{  
  "node_id":"myCustomNodeId",
  "node_type":[ "WebUser", "Stream" ] ,
  "health":[  
      {
        "test_name":"Streamservice-Check",
        "status":"success",
        "last_reported_time":"2018-07-30T20:37:29.656"  
      },{  
        "test_name":"HTML-Stream-Check",
        "status":"success",
        "last_reported_time":"2018-07-30T20:37:29.656",
      }  
  ],
   "status":"healthy"
}

Releases earlier than Pega 8.2

In releases earlier than Pega 8.2, the Pega Ping responses looks like this example:

{
"duration": "201.293172",
"count": "-1"
}

How do I get active browser requestor count?

Pega 7.3.1 and earlier releases

With Pega 7.3.1 and earlier releases, the ping service used to give the number of active requestors present in a particular node. Because ping is a synchronous API, getting the requestor count causes some performance issues.

Therefore, returning the requestor count was disabled in these earlier releases by setting the DASS disableActiveUserCount to true:

Rules set: Pega-RULES
Setting name: disableActiveUserCount
Setting value: true

Pega 7.4 and later releases and a Pega Cloud Services environment

In Pega 7.4 and later releases, you can count the number of active browser requestors by using this REST service:

/PRRestService/monitor/v1/sessions/browser

This REST service gives results only if you enable the maximum limit of concurrent browser sessions and the environment is Pega Cloud Services.

Set some +ve value to cluster/requestors/browser/maxactive in the prconfig.xml file setting.
Example: <env name="cluster/requestors/browser/maxactive" value="200"/>

What is the activity that is run by the Pega Ping service?

For releases prior to Pega 8.2, the Pega Ping service is a REST service pingService in the monitor package that runs the activity pzGetHealthStatus.

In Pega 8.2 and later releases, the Pega Ping service does not use REST infrastructure and no activity is processed. The engine handles the ping requests without requestor context.

My node is reporting 'unhealthy'. What do I need to do?

Pega 8.2 and later releases

In Pega 8.2 and later releases, several health checks run to determine the health of a node.
If any of the health checks fail, then the node's health is marked as unhealthy and the URL returns a response code of 500.
The Pega Ping response includes the details on the health checks being run and which check failed.
Look at the ping response body (JSON) to see these details. 

See I see status code 500: What additional artifacts do I need to collect? I see status codes indicating a problem, but I do not see any error in the PegaRules log file. Why?

Releases prior to Pega 8.2

In releases prior to Pega 8.2, you see an exception in the logs like Timed out borrowing service requestor from requestor pool for service package: monitor  or some exception in executing the activity pzGetHealthStatus.
In either case, review the Pega-Rules logs, which will provide more information.

See Understanding logs and logging messages and Understanding the PegaRULES Log Analyzer.

My ping service is returning 500 status (unhealthy) but reviewing the ping JSON or Pega-Rules logs does not help me. Whom do I contact?

Go to My Support Portal to submit a support case (INC) for GCS assistance. See My Support Portal: New Design, Streamlined Features.

If your environment is a Pega Cloud environment, in My Support Portal, select My Pega Cloud to Self-manage your Pega Cloud environments from My Support Portal.

The GCS engineer will work with the Product or Service team that owns the service failing the node health check:

  • HTML-Stream-Check owned by the Engine-as-a-Service team
  • Streamservice-Check owned by the Streaming and Large-scale Processing team
  • StaleThreadHealthCheck owned by the Decisioning & Analytics team
  • ServiceRegistry-Check owned by the Data Sync and Caching team

My ping service health check displays N_CriticalErrorNotification. What does this mean? What do I need to do?

N_CriticalErrorNotificationis reported by a heath check notification when there is a critical error that occurred in the node, usually an Out of Memory (OOM) error. You need to determine the root cause of the OOM error. See the answer to the next question.

Do the node health checks catch all Out of Memory errors?

The Pega Ping service also returns an unhealthy status for a node when an Out of Memory (OOM) error occurs in the node. Usually an OOM error will mark the node as unhealthy.
However, OOM errors occurring from third-party JAR files are not caught by the node health checks. Because of this limitation, the node health checks catch only about 70 percent to 80 percent of the OOM errors. 

When OOM occurs, the Pega Ping response looks like this example:

{

   "node_type":" "[ "WebUser"],

   "health":" "[

      {

         "last_reported_time":"2020-08-07T22:27:28.424Z",

         "test_name":"HTML-Stream-Check",

         "status":"success"    

     }, {

         "test_name":"N_CriticalErrorNotification",

         "status":"failure",

         "last_reported_time":"2018-07-30T20:37:29.656"  

     }

  ],

   "state":" "   "unhealthy",

   "node_id":" "   "10.150.69.32_envblr85-web-3"

}

What are the typical issues with the Pega Ping service in releases prior to Pega 8.2?

Prior to Pega 8.2, you might encounter the following issues: 

  • The requestor pool timed out borrowing service requestor from requestor pool for service package.
  • Ping service is not reporting an unhealthy node when OOM occurs.

In Pega 8.2, the Pega Ping service timeout is fixed and. most of the time, OOM errors will mark the node as unhealthy. However, OOM errors occurring from third-party JAR files are not caught by the node health checks. Because of this limitation, the node health checks catch only about 70 percent to 80 percent of the OOM errors. 

Node health monitoring features provided by Pega 8.2 and later releases

With Pega 8.2 and later releases, reliable monitoring and reporting of node health is afforded by the following improvements:

  • All node health checks are run asynchronously and periodically. You can keep or adjust the default settings.
  • Every health check must be completed within the configured time. The default value is 5 seconds. When a health check exceeds the specified time, the health check fails.
  • Results of all health checks are aggregated at one place after they run.
  • Each check result has an expiry time. For a particular health check, if the results are not updated within the specified time, for example 60 seconds, then the health check fails and the node is reported as unhealthy. This detects if there a problem in background job itself.
  • Each component specifies its own health checks and registers them with the Health monitor component.
  • Components can specify the NodeType for which the check needs to run during health check registration.
  • Only health checks that are registered for the current node type are picked for processing.
  • Engine components can publish the health events by specifying the event and the event handler. These results will not expire during result aggregation.
  • When a ping request comes from the client, the status of all health checks is aggregated, and the final health status is sent with the JSON response.
  • If you encounter an issue with any of the health checks, you can disable those checks using the Data-Admin-System-Setting (DASS) identified in Settings. The disabled health checks are not run in the next cycle of monitoring.

Settings

 Keep or adjust the default settings for monitoring the health of Pega system nodes:

Default settings for monitoring node health
Setting Name Type Default Value Description
monitor/health/monitorInterval prconfig.xml 15 (seconds) Health monitor daemon interval in seconds
monitor/health/checkTimeout
prconfig.xml
5000 (milliseconds) Health monitor check execution timeout in milliseconds
monitor/health/statusTimeout
prconfig.xml
120 (seconds) Health monitor status expiration in seconds
monitor/health/disableChecks
Data-Admin-System-Setting (DASS)
None To disable checks dynamically

You can create Dynamic System Settings (DSSes) from the prconfig.xml settings by following the procedure in Creating a dynamic system setting. For complete information, see Configuring dynamic system settings.

Did you find this content helpful?

100% found this useful


Related Content

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us