How to test whether a Process Commander node is running
Various external systems may need to test periodically whether a Process Commander node is running. This note describes a few approaches; other approaches may be workable in specific settings.
In a production setting, it may be important to monitor whether a Process Commander system is running. The System Management Application (Version 5.1) or the System Console (Version 4.2) provide ways to check this. However, they require human users to enter requests and review results.
Automated facilities to monitor availability need simpler tests that can be run every few minutes or so around the clock and do not consume significant system resources. When no response or an error response is received, these facilities can notify staff or initiate failover actions. For example, when one node of a cluster is down, HTTP load-balancing switches can route future traffic to other nodes.
CAUTION: Any such probing places an added workload on the system and should be performed no often than necessary to meet the monitoring object. If monitoring at intervals shorter than a minute, experiment to make sure the added workload is not affecting overall performance.
Each time the PRServlet servlet is started, it presents the Process Commander log-on form. The standard URL:
-- causes Process Commander to create a guest requestor session, one that has access to only the RuleSets, access roles, and privileges associated with the PegaRULES:Unauthenticated access group.
Ordinarily, a human user completes and submits the login form with an Operator ID and password, becoming an authenticated user with additional access roles, RuleSets, and privileges. Otherwise, an idle guest requestor automatically times out within a minute or so, to avoid tying up resources.
The login form is not well designed for an external system to interpret, and may be transmitted correctly even when the application server and Process Commander application are up but the PegaRULES database is down or unresponsive.
An HTML rule (Rule-Obj-HTML rule type) can assemble a response that is easier to parse and different each time. For example, the HTML rule named BaseHelloWorld presents the current date and time:
When executed, this rule produces a Web page similar to this:
Of course, to be executed by a guest requestor, the HTML rule must belong to a RuleSet available to unauthenticated requestors.
To implement this basic approach:
Create a RuleSet and Version 01-01-01 to be available to guest requestors. (Review and test carefully any rules added to this RuleSet, to retain system security.)
Update the access group PegaRULES:Unauthenticated to include this RuleSet version.
Create an HTML rule similar to BaseHelloWorld above. Your message can be formatted as desired, and may include additional facts from the clipboard, such as the date and time of the last system pulse.
- Test using the URL:
-- substituting your host name, TCP/IP port number, and HTML rule name. Each HTTP request creates a guest requestor, which is destroyed automatically after an idle minute.
In the external system, create a script to send the URL periodically, and to time-out and notify someone when no response is received.
As an alternative to the HTML rule, the guest requestor can run a Process Commander activity to provide additional confirmation of the health of the system. However, care is required to ensure that the activity (and all others that it calls) uses only facilities available to unauthenticated access
In this example, the activity @baseclass.IsNodeUp runs a list view rule and presents the count of results in the response. The activity has three steps:
Step 1 runs the standard list view named Code-Pega-Requestor.RequestorsOnline (in Prepare mode, which assembles the data on the clipboard but doesn’t display the results
Step 2 sends the processed HTML stream from an HTML rule named IsNodeUp, which is similar to the BaseHelloWorld rule used in the basic approach above.
To minimize demand on system resources, Step 3 calls the Requestor-Stop method. (If this step is omitted, the idle unauthenticated requestor is removed automatically after about one-minute.)
The access role PegaRULES:Guest is does not provide an unauthenticated requestor with the capabilities needed to run this activity and open the list view rule. To provide this security:
Update the access role in the PegaRULES:Unauthenticated access group from PegaRULES:Guest to PegaRULES:Maximum
Create an Access-of-Role-to-Object rule named PegaRULES:Guest-Maximum.Code-Pega-Requestor that permits searching.
- Log off. Test the configuration using the URL:
The result is similar to the following:
The requestor count typically includes four built-in agents (Pega-RULES, Pega-ProCom, Pega-IntSvcs and the Master Agent) plus the guest requestor created for this probe.
Refinements and Alternatives
This small example can be evolved further.
- The activity can perform additional application-specific testing to confirm whether external databases are accessible, internal queues are larger than a specified upper limit, and so on. However, each such change may require additional capabilities for the unauthenticated requestor (through additional Rule-Access-Role-Obj rules).
- Alternatively, you can use an authenticated requestor. This adds processing overhead and delay but may not require changes to Guest or Guest-Maximum the SnapStart URL format, adding values for UserIdentifier and Password. (The password is base64-encoded.)
- Rather than send back an HTML page as a probe response, the Process Commander system can send back an XML document. This may be easier for the initial system or script to parse.
- If the application server is not up, no response occurs. Ideally, the initial system can detect this. If this isn’t feasible, the initial system can send the HTTP requests through a proxy server which can respond with HTTP Status 502 (Connection failed) after a specified interval.
- If the application server is up but Process Commander is not up, the application server may send HTTP Status 503 (Connection not available).
- If the application server is up and Process Commander is up but busy, a HTTP Status 408 (Request timeout) response may be sent.
- Process Commander services such as SOAP (Rule-Service-SOAP rule type) and email (Rule-Service-Email rule type) may be useful in some cases. Service requestors may be authenticated or unauthenticated.
The above approaches are “read-only” and restricted to testing a single node. For detailed probing, a custom Java program on the client system can use Java Management Extensions to access mBeans maintained by each Process Commander node. The V5.1 System Management application (prsysmgt) uses this approach.