Close popover

Table of Contents

Sizing a Cassandra cluster

Version:

Achieve high performance in terms of data replication and consistency by estimating the optimal database size to run a Cassandra cluster.

Obtain the sizing calculation tool by sending an email to HardwareEstimate@pega.com.
  1. On a production system on which you want to run a Cassandra cluster, select at least three nodes.

    You can run multiple nodes on the same server provided that each node has a different IP address.
  2. In the sizing calculation tool, in the fields highlighted in red, provide the required information about records size for each of the following decision management services:

    1. In the DDS_Data_Sizing tab, provide information about Decision Data Store (DDS), such as the number of records and the average record key size.

    2. In the Delayed_Learning_Sizing tab, provide information about adaptive models delayed learning, such as the number of decision per minute and the average record key size.

      For more information, see the Delayed learning of adaptive models article on Pega Community.
    3. In the VBD_Sizing tab, provide information about business monitoring and reporting, such as the number of dimensions and measurements.

      For more information, see Visual Business Director planner.
    4. In the Model_Response_Sizing tab, provide information about collecting the responses to your adaptive models, such as the number of incoming responses in 24 hours.

      For more information, see Adaptive analytics.
  3. Calculate the required database size for your Cassandra cluster by summing up the values of the Total required disk space fields from each tab.

  4. Ensure that you have enough disk space to run the DDS data sets by dividing the database size that you calculated in step 3 by the number of available nodes and ensuring that the size of each node does not exceed 50% of the database size.

  5. If you use the cluster for simulations and data flow runs, increase processing speed by adding nodes to the cluster.

  • Defining Pega Platform access to an external Cassandra database

    Manage Pega Platform access to your external Cassandra database resources by creating Cassandra user roles with assigned permissions.

  • Configuring the consistency level

    Achieve the level of consistency that you want by deciding how many Cassandra nodes in a cluster must validate a write operation or respond to a read operation to declare success.

  • Configuring the replication factor

    Ensure reliability and fault tolerance by controlling how many data replicas you want to store across a Cassandra cluster.

  • Configuring the replication factor

    Ensure reliability and fault tolerance by controlling how many data replicas you want to store across a Cassandra cluster.

  • Configuring the Cassandra cluster

    Pega Platform comes with an internal Cassandra cluster to which you can connect through a Decision Data Store data set. Before connecting to the cluster through Pega Platform, perform the following steps to achieve optimal performance and data consistency across the nodes in the cluster.

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.