Server crash recovery

Two settings are required to enable Pega Platform server crash recovery:

  • storage/class/passivation:/rootpath must be set to shared storage that is available to all servers in the cluster.
    • Depending on the operating system, the details of the configuration will vary.
    • Shared storage should be deployed so it is not a single point of failure.
    • Shared storage itself should have a failover solution.
    • Server restart is required to change the location of shared storage in the Pega Platform.
  • session/ha/crash/RecordWorkInProgress=true indicates to the Pega Platform that user interface metadata will be stored to the share file system.
    • This setting can be changed on the high availability landing page, in DASS, or using prconfig.xml settings, depending on requirements.
    • A server restart is required for changes to take effect.

Pega Platform server failover only works if the Pega Platform server that fails is taken out of service from the load balancer. Requests that were serviced from the crashed Pega Platform server are redirected to new Pega Platform servers. This implies that a production class load balancer is employed, as well as passive or active monitoring of the application.

There are two steps in the recovery:

  • On redirection to a new Pega Platform server, the user must re-authenticate. The high availability best practice is to enable single sign on to avoid user interruption.
  • When the server processes the request, it detects that there has been a crash event and uses the user interface metadata to reconstruct the user interface. Since the user’s clipboard is not preserved from the crash, data that has been entered but not committed on assign, perform, and confirm harnesses is lost.