BIX design models
Business Intelligence Exchange (BIX) is an extraction tool in a well-defined ETL (extract, transform, load) process meant to populate a data warehouse with information from a Pega 7 production database. Use BIX to combine Pega 7 data with large amounts of data from other systems and databases for reporting and analytics to provide management with a global view of the business.
This document provides an overview of how to design an end-to-end process for accomplishing this using BIX. It describes designs for extracting data using batch processing and discusses trade-offs of each design regarding performance, latency, and error handling and database integrity.
How BIX extracts data
Most extraction is performed against cases and work items, along with other associated types of data, such as work assignments. Each of these types of data is referred to as a class of data in Pega 7, and an individual case or work item is referred to as an instance of that class.
Each case or work item includes many fields of information, referred to as properties, such as customer or product names, the date when the item was created, and its current status. This document refers to class instances rather than cases or work items, because you can use BIX to extract information from classes that are not specifically related to work or cases, such as the user information stored in Data-Admin-Operator-ID.
All of this data is stored in a relational database (DB), managed using a database management system (DBMS) product such as Oracle, SQL Server, or UDB. Classes, instances, and properties are conceptually similar to the tables, rows, and columns in the database. Physically, they may be stored differently in the database.
Pega 7 first stores the data entered or displayed in application user interface forms in memory on what is known as the clipboard (which is not visible to the end user). When an end user clicksor to save changes, the system writes the data to the database. A case or work item is written as a single row within a table in the database. See the following diagram, which uses quotes for auto insurance collision policies:
Pega 7 stores the data for a class instance representing a case or work item in a single row or record of the Pega 7 production database table for that class. The data model for a case or work item can be quite large and complex, and include multiple levels of hierarchically embedded sets of fields, referred to as pages, page lists, and page groups.
Most of the database operations that Pega 7 performs (including opening a case or work item from a worklist or workbasket, saving changes to a case or work item, and reading a business rule) are optimized through Pega 7's architecture. A single binary large object (BLOB) field is used within the record to store all of the data for the case or work item as a compressed XML file, mirroring the hierarchical data model of the class instance.
Not only does the BLOB optimize the performance of transactions involving individual cases or work items, it also lets customers Build for Change®, Pega 7's number one goal for this architecture. No matter how complex the data model, all of the data for a case or work item is stored in a single field. This means that business analysts and developers are able to dynamically change their data model or add new pieces of information to a business process without having to involve DBAs in making database changes.
Using the BLOB also makes it easy to maintain multiple versions of the data model over time, and allows for different cases with different data models to all be stored in the same table.
When BIX extracts the data for a class instance to a database, it writes out the information in the BLOB as records into an extract table in the destination database. During this process, the data is normalized in the BLOB, which writes out records for each embedded page, page list, and page group in the data model to different extract tables. These subsidiary extract tables contain foreign keys that allow the records in them to be appropriately joined to the record in the main extract table for the class instance that they belong to.
The following diagram illustrates the structure of the tables that BIX writes to. It displays only a small subset of the top-level properties for the class, and only one of the many subsidiary tables (the drivers on the collision quote):
In most data warehouses, the design of the main fact table for each class, which stores the top-level properties for that class, uses one of three models:
- A transaction table contains a single row for each transaction, which in Pega 7 corresponds to a class instance. This model mirrors the information in the class table in Pega 7. It captures only the current state of each class instance (its state since the time of the last extraction of the data for that instance).
- A periodic snapshot table contains multiple rows for a given class instance, extracted at regular intervals. For daily snapshots, a row would display for a class instance from the first day it was created until the day it was deleted (Pega 7 does not physically delete instances of work classes, and instead modifies their status to indicate they have been withdrawn).
- An accumulating snapshot table contains a separate row for a class instance when it is created and also each time it is updated. This model provides a full history of changes to each class instance.
Running BIX as a batch process to populate table models
This section discusses how to use BIX running in batch mode on scheduled basis to populate the tables in each of the three types of models described earlier in this article.
Using BIX Extract tables as staging tables
The approaches described in this section all assume that BIX is not used to extract data directly into the fact table for the class. Instead, the extract tables written to by BIX are used as staging tables because additional post-processing is required after completion of BIX processing to transform the information in these tables into well-formed fact and dimension tables. This is especially true if the information extracted from the Pega 7 database using BIX needs to be combined with data from other systems of record in a centralized data warehouse for enterprise-wide reporting.
Since BIX does not provide full ETL capabilities, you must use other tools to transform extract table data when migrating it into the required fact and dimension tables. Because of this, BIX only performs INSERTs into the extract tables it writes to.
However, even if no significant transformations of the data are desired, and customers wish to report directly from the BIX extract tables, post-processing may still need to be performed after the completion of BIX processing (for example, to prevent duplicate records, since BIX only performs INSERTs to the destination database) as described below.
Designs for different table models
- Transaction table – BIX can create a transaction table with the option to extract all class instances created or modified since the last extraction. Using BIX in this way involves two issues:
- The data in the transaction table is only current as of (approximately) the time when the last extract started.
- Post-processing has to be performed. Following an extract run, a separate process should delete previously-extracted object records. These can be identified as records for the same object (such as one with the same pzInsKey value) but with an earlier extract DateTime value than the latest extract DateTime value for the object. A DELETE statement like the following needs to be run for each destination table and its subsidiary tables in the destination database:
Delete from <TableName> where pzInsKey in (select pzInsKey from <TableName> where pxExtractDateTime = (select max(pxExtractDateTime) from <TableName>)) and pxExtractDateTime < (select max(pxExtractDateTime) from <TableName>)
- Periodic snapshot table
- Versions of BIX prior to 7.1 can create periodic snapshots by extracting all instances for a class each time a scheduled extract is performed, but this can be very expensive and time-consuming for applications with large numbers of class instances.
- BIX 7.1 offers a more efficient way to extract and store snapshots of the data. Instead of having a separate record for each period, each record written has both an effective date (the date and time for the BIX extract run in which the record was written) and an expiration date (the date and time for the next BIX run in which the same object was extracted). With an effective date and expiration date on each record, it is easy to write queries that return the valid data for a set of objects at any point in time, or at multiple points in time for use in change reports.
- Accumulating snapshot table
- Requires running BIX as each case or work item is updated.
- This is not supported since BIX runs as a batch process.