Incident response and management for Pega Cloud Services
This article is part of the Pega Cloud Services Subscription Documentation.
Pegasystems provides rapid response and efficient resolution for incidents that affect the Pega Cloud Services network and client environments. Pegasystems is committed to client satisfaction by being proactive and focused in the following areas:
- Continuous improvement of preventive safeguards in Pega Cloud Services environments
- Achievement of 99.95% uptime service-level agreement (SLA)
- Ongoing reduction of incident occurrences
- Continuous improvement of incident response and resolution
Pega Cloud Services incident response and management includes:
- A help desk that responds to client support request calls 24 hours a day, seven days a week.
- A web-based, mobile-enabled support request ticketing system that is used by clients and the Pega Cloud Services support teams for managing, tracking, monitoring, and communicating incident status from submission through resolution.
- Replicated Service Reliability Center (SRC) facilities in the United States, India, and Poland, which provide network monitoring and incident response resiliency, with managed shift handovers and on-call scheduling to ensure coverage 24 hours a day, seven days a week.
- Three tiers of technical and engineering staff to provide incident response, triage, root cause analysis, and resolution. Response procedures include the use of standard operating procedures that are maintained and kept current in a knowledge base, escalation to higher-expertise tiers and supporting teams, and bridge calls for collaboration.
- Incident severity, impact, and type classification for effective prioritization of tickets and assignment of qualified experts, with supervised monitoring of ticket status and progress.
- Contingency and disaster recovery plan activation and escalation, in the event of a major incident that involves multiple clients.
- Partner and vendor incident response support (for example, for Amazon Web Services) as needed for triage and resolution.
- Reporting and analysis of incident response performance metrics to achieve SLAs.
50% found this useful