Data Archival in Pega: Common Issues and How to Fix Them

Back to Expert Circle

Conversation

Sairohith

Member since 2016

HCA Healthcare

Posted: Jan 10, 2026

Last activity: Jan 10, 2026

Posted: 10 Jan 2026 16:54 EST
Last activity: 10 Jan 2026 17:27 EST

Data Archival in Pega: Common Issues and How to Fix Them

If you’re on Pega Cloud, setting up data archival is actually straightforward. You define the archival policy for the case type, set the dataarchival/batchPipelineEnabled DSS to true, and enable the pyPegaArchiverUsingPipeline job scheduler with an appropriate schedule.

Here’s the thing though. In real projects, the struggle usually isn’t the setup. It’s figuring out why archival stalls, runs partially, or fails without telling you much. If that sounds familiar, this article is for you.

The Archival Pipeline at a Glance

At a high level, Pega archival follows a four-step pipeline:

Crawler
Identifies cases eligible for archival based on retention rules and policies.
Copier
Copies eligible case data from the primary database (your active Pega DB) to a secondary archival store.
Indexer
Indexes archived cases into Elasticsearch so the data remains searchable after archival.
Purger
Removes the archived data from the primary database once copy and indexing succeed.

Earlier implementations used three separate activities:

pzPerformArchive
pzPerformIndexing
pxPerformPurge

Latest Pega versions consolidate this logic into a single OOTB pipeline activity:

pzPerformArchiveUsingPipeline

This consolidation reduces orchestration complexity and is especially beneficial in cloud environments.

Pega Cloud Makes Archival Easier

On Pega Cloud, archival setup is largely simplified:

Pega provides a secondary archival database out of the box.
When using Pega Cloud File Storage Repository, archived data is stored using managed object storage (backed internally by cloud storage such as Amazon S3).
Configuration relies almost entirely on OOTB features rather than custom code.

In this model:

Primary database holds active cases.
Secondary database holds archived case data.

The setup itself is usually straightforward. Most real-world problems start after the pipeline is enabled.

The Challenge: Troubleshooting Archival Failures

Most archival failures are not caused by incorrect rules. They stem from lack of visibility.

Enable the Right Logs First

Before you debug anything else, get the logging right. This one step will save you hours.

In Admin Studio → Resources → Log Categories, configure the following loggers.

Native SQL Execution

These help identify database-level issues during copy and purge.

com.pega.pegarules.data.internal.sqlapi.exec.NativeSQLListExecutor
org.apache.http.wire

Search and Indexing (SRS)

These are critical for understanding Elasticsearch-related failures.

com.pega.platform.search.infrastructure.internal.srs.SRSConnectorImpl

Archival Pipeline Logs

These give you end-to-end visibility across the archival pipeline.

Archival-CaseCrawler
Archival-CaseCopier
Archival-Indexer
Archival-Purger
Archival-Search
Archival-Datastore
Archival-General

Why This Matters

With these logs enabled, you can clearly see:

SQL errors during copy or purge operations
Indexing failures that never surface in the UI
Pipeline execution issues that otherwise look like silent or partial failures

Without this level of logging, archival troubleshooting quickly turns into guesswork. With it, failures usually explain themselves.

Using PR_METADATA to Understand What Failed

PR_METADATA is one of the most useful and underused tools for debugging archival issues. When the pipeline fails, this table usually knows why.

A Practical Debugging Pattern

Clear existing metadata
Create a small utility activity with a single step:
- Method: RDB-Delete
- Class: Data-ArchivalMetadata
- RequestType: pzTruncateDataArchivalMetadata
- Access: ALL
This gives you a clean slate before testing again.
Run archival manually
Execute the archival steps manually using the three core activities (archive, index, purge). This isolates which phase is failing instead of relying on the scheduled job.
Query the metadata
```
SELECT * FROM pr_metadata;
```
Inspect the results
Pay close attention to the pyComment column.

What You’ll Usually Find

PR_METADATA often contains clear, actionable failure details, such as:

Indexing failures
SQL exceptions
Copy or purge errors tied to specific cases

Instead of rerunning jobs blindly and hoping for a different outcome, this approach gives you concrete signals about what failed and where to focus next.

Indexing Is the Most Fragile Step

Across projects, indexing is the most common point of failure in the archival pipeline.

Typical symptoms include:

Data copied successfully but not searchable
Pipeline completing partially
Repeated retries without progress

Most indexing issues fall into one of three buckets:

Elasticsearch connectivity or schema mismatches
Platform defects in specific product versions
Environmental constraints preventing index writes

The key is visibility. With proper SRS and archive logs enabled, these issues become diagnosable rather than mysterious.

Environmental Constraints Can Stop Archival Completely

One issue teams often overlook is environmental health, especially in lower-tier environments.

Examples include:

Blob storage limits
Insufficient processing capacity
Queue or thread starvation

When this happens, the pipeline may stop picking up cases altogether.

What to Watch For

PEGA0004 alerts in logs
PoisonPill messages indicating the system intentionally halted processing

This is Pega signaling that archival cannot proceed safely until the environment is stabilized. No rule or pipeline change will fix this until the underlying constraint is resolved.

To see attachments, please log in.

Reply
Likes (3)

Ivan Anikanov Atanu Sen Manojkumar Sandoori
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Conversation

Data Archival in Pega: Common Issues and How to Fix Them

The Archival Pipeline at a Glance

Pega Cloud Makes Archival Easier

The Challenge: Troubleshooting Archival Failures

Enable the Right Logs First

Native SQL Execution

Search and Indexing (SRS)

Archival Pipeline Logs

Why This Matters

Using PR_METADATA to Understand What Failed

A Practical Debugging Pattern

What You’ll Usually Find

Indexing Is the Most Fragile Step

Environmental Constraints Can Stop Archival Completely

What to Watch For

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Conversation

Data Archival in Pega: Common Issues and How to Fix Them

The Archival Pipeline at a Glance

Pega Cloud Makes Archival Easier

The Challenge: Troubleshooting Archival Failures

Enable the Right Logs First

Native SQL Execution

Search and Indexing (SRS)

Archival Pipeline Logs

Why This Matters

Using PR_METADATA to Understand What Failed

A Practical Debugging Pattern

What You’ll Usually Find

Indexing Is the Most Fragile Step

Environmental Constraints Can Stop Archival Completely

What to Watch For

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.