Support Article

Replication between data-centers for Active-Active setup

SA-8073

Summary

User has Pega 7.1.7 and implementing NBAA application on top of NBAA framework. For production deployment, they are planning Active - Active set-up across 2 data-centres and have following questions -

A.] In the context of VBD and Adaptive Decisioning Manager
- which database tables should be set-up for replication across the data-centres?
- which of these tables are configuration purpose Vs transactional (i.e. table contents grow with new IH records or with new models for example)

VBD:

1. PR_DATA_VBD_CONFIG
2. PR_DATA_VBD_COLLECTION_CONFIG

ADM:

1. SETTINGS
2. CONFIGURATIONS
3. FACTORIES
4. ADM_EVENTQUEUE

B.] What is the purpose of following tables and should they be set-up for replication?

1. PR_DATA_DM_ADMMART_MDL_FACT
2. PR_DATA_DM_ADMMART_MODEL
3. PR_DATA_DM_ADMMART_PRED_FACT

C.] Regarding Interaction History tables (total 9)
- How is the primary key value generated for each of the tables?
- How is it guaranteed to be unique across the data-centers to avoid collision when replicating IH data across data-centers

D.] Is there any reference document that can be referred to for guidance for setting up Active / Active deployment around decisioning components
- decisioning tables replication
- ADM service set-up
- any design considerations

Resolution

Following explanation was provided to the user -

A] VBD uses both PR_DATA_VBD_CONFIG and PR_DATA_VBD_COLLECTION_CONFIG for configuration purpose and they do need to be replicated across data-centers.

PR_DATA_VBD_CONFIG is used during VBD start-up and it contains properties as the partition size in which data will be stored, lazy loading policy and cache eviction timeout for data sources.

PR_DATA_VBD_COLLECTION_CONFIG is also used during start-up and it contains properties as type of each datasource (database, filesystem), url, query for initial load and etc. This table is also used when a data source is evicted from the cache and needs to be reloaded. This table will be populated whenever a new data source is created, for instance from a simulation run.

ADM does NOT support Active-Active replication. ADM is a stateful service, i.e. the ADM server keeps all state in memory, and writes this (by default) once per hour to the DB, but it never reads from the DB - except during the start-up. To use ADM in two data-centres you have two options:

(1) Use a single, shared ADM server for both data-centres. E.g. run it at DC1, and let the PRPC nodes in DC2 connect to the ADM server in DC1. The ADM schema will then only live in DC1, and will not be replicated
(2) Use separate, dedicated ADM servers per DC, each with its own dedicated schema. This implies that data cannot be shared between the DC's.

B] Those tables are used for ADM reporting, and are populated by the ADM reporting agent (if enabled), which will extract on a regular basis statistics from the ADM server, an put that in those tables, so reports can be generated. If (at question A) option (1) is implemented, then these tables could be replicated from DC1 to DC2, so the ADM reports are available in DC2 as well. If option (2) is implemented, then each DC will have its own ADM instance, with its own ADM reporting data - i.e. the content of these tables will be different and specific per DC.

C] For the Fact table, we generate a 'random-sequence', which extremely low likeliness of collisions. It has been designed & implemented with active-active replication in mind. The dimension tables use a hash-code based on the contents. So if identical records are inserted in two DC's, they are guaranteed to have the same PK, so this is again safe to use in an active-active replicated environment.

D] Same answer at A, i.e. Active-Active is not supported for ADM.

And, VBD used GridGain as memory data grid. GridgGain will be responsible for distributing the data and queries among the nodes. In case a node goes down, GridGain will reload that data back to the grid once a query is triggered, but that only applies for database data sources, we can't retrieve data from the others. Both tables are mainly used during start-up time to load configuration into memory. It's important that PR_DATA_VBD_COLLECTION_CONFIG is replicated because it will store relevant information for data sources and that will be loaded again on every VBD start-up. In a setup with multiple VBD nodes, the engine will update each node about the configuration change and in turn they will update the database.

Published April 2, 2015 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

COVID-19 Employee Safety and Business Continuity Tracker

Replication between data-centers for Active-Active setup

Summary

Resolution

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

COVID-19 Employee Safety and Business Continuity Tracker

Replication between data-centers for Active-Active setup

Summary

Resolution

Have a question? Get answers now.

We'd prefer it if you saw us at our best.