Cassandra is an open source, distributed, and high-performance database project for storing high volumes of data. In Pega® Platform, you can use it to stage data for fast decisioning or in situations when you want to access data very quickly by using a particular key. Pega Platform includes an internal Cassandra database cluster to which you can connect through the Decision Data Store service. Pega Platform also supports connecting to external Cassandra clusters through the Decision Data Store service or through the Connect Cassandra rule type. You can use a variety of connection options to create flexible solutions in your business application.
Internal Cassandra connection
The Decision Data Store service can operate as part of the existing Pega Platform node cluster. This means that you can select a Pega Platform node to be part of the Decision Data Store service. By using a Pega Platform node cluster in the Decision Data Store service, you can control that cluster's size and data model from your application. When you add a Pega Platform node to the cluster of decision data nodes, a Cassandra instance is automatically deployed on that node as an independent JVM process. When a connection is established, you can use your application to create the relevant data model by adding properties in data classes. These properties are later propagated as keys to the Cassandra cluster through decision data store data sets.
Pega Platform includes Cassandra 2.1.
To configure a Pega Platform node as a decision data store node, select it from the list of available Pega Platform nodes on the Decision Data Store tab of the Services landing page.
Adding decision data store nodes on the Services landing page
Multiple internal Cassandra decision data nodes
Cassandra-based decision data store in a Pega Platform node cluster
Each Pega Platform node that is configured as a decision data store node has its own Cassandra data store. All Cassandra data stores in Pega Platform node communicate with each other to create a distributed database in which decision data is evenly distributed across all decision data store nodes. For example, if you have two decision data store nodes, each node will contain 50% of decision data, if you have three decision data nodes, each node will contain 33.3% of decision data, and so on.
Decision data store nodes
With this solution, you can add more Pega Platform nodes to the Decision Data Store service when you need more disk space to accommodate decision data.
External Cassandra connection
You can also connect to external Cassandra clusters to store your decisioning data. Depending on whether your Cassandra cluster is already populated with data or is empty, you can connect to that cluster through a Connect Cassandra rule or through the Decision Data Store service.
Pega Platform supports connections with external Cassandra in version 2.1 or higher.
External Cassandra cluster in the Decision Data Store service
Use this method when you want to use a clean Cassandra cluster that is external to Pega Platform to store your decisioning data. In this case, you control the Cassandra data model from your application through decision data sets.
Configuring external Cassandra cluster for use in the Decision Data Store service
Before you can connect to an external Cassandra data store through the Decision Data Store service, perform the following actions on the Cassandra cluster:
- Turn on the authentication for each node in the cluster:
- In the cassandra.yaml file, configure the authenticator:PasswordAuthenticator parameter. By default, the value of this parameter is authenticator:AllowAllAuthenticator.
- Restart all Cassandra nodes.
- Create a Cassandra user account.
- Ensure that the Cassandra user is present on all nodes by setting the replication factor for the system_auth parameter to the number of nodes in the cluster.
- On any of the Cassandra nodes, run the nodetool status command and check whether the status of each node is OK. The status list should show all the nodes that you want to connect to from your application.
Use the Cassandra user credentials to connect to the external Cassandra cluster from Pega Platform.
Adding an external Cassandra cluster as decision data nodes
Multiple external Cassandra decision data store nodes
External Cassandra through the Cassandra-Connect method
Through the Connect Cassandra rule type. Use this method to read or write in an external Cassandra instance that is already populated with data. In this case, Cassandra defines the structure of your data model because you must map the columns from the data store to new properties in your application.
For more information, see Connecting to an external Cassandra data store through Cassandra Connect rules.
Data models in Cassandra data stores
The data model in Cassandra nodes that are configured as part of the Decision Data Store service (both internal and external) is controlled by Pega Platform. When you assign a specific node to be a decision data store node, that node does not contain an underlying data model. The data model is created in Pega Platform through Decision Data Store type data sets. When you create a data set, you must specify its type as Decision Data Store and add keys that reflect the data model that you want. When you save the data set, Pega Platform propagates your data model to decision data store nodes.
Plan your data model carefully. Each property that you want to query for must be added as a key in the Decision Data Store data set. An excessive number of keys negatively affects the performance of the data set. The keys that you create cannot be changed after the data set is saved.
Creating a DDS data set
You cannot add an external Cassandra data store with populated data or an established data model to the Decision Data Store service. To use a Cassandra instance such as this one as a data store for your decisioning operations, you can connect to it through the Connect Cassandra rule and copy the data model from that Cassandra data store to Pega Platform.
Cassandra use cases
Designating Cassandra to be the data store for Pega Platform decisioning operations returns the greatest benefit when you use the full potential of the Cassandra database in your solution. Delayed adaptive learning and Visual Business Director demonstrate the Cassandra capabilities for managing large and active data sets in decision management.
Delayed adaptive learning
When customers are presented with offers, they often do not respond to them immediately but take some time to evaluate those offers and respond. In delayed learning, when an offer is made to a customer and there is no immediate response, the interaction record is cached in Cassandra (for a specified maximum period of time). When the customer responds, the interaction record is retrieved from the Cassandra cache at high speed, the response is attached, and that record is sent to the Adaptive Decision Model for analysis.
For more information, see Delayed learning of adaptive models.
Visual Business Director
You can use Visual Business Director (VBD) to view the performance of actual and proposed strategies at a detailed level with a three-dimensional view. On the Visual Business Director tab of the Services landing page, you can define the number of decision data store nodes that are purposed to run business monitoring and reporting. The VBD decision data nodes store customer interaction history in the form of dimensions, properties, and key performance indicators that are later queried for from your application to visualize decision results.
For more information, see Visual Business Director.