LinkedIn
Copied!

Table of Contents

Sizing a Cassandra cluster

Version:

Only available versions of this content are shown in the dropdown

Achieve high performance in terms of data replication and consistency by estimating the optimal database size to run a Cassandra cluster.

Obtain the sizing calculation tool by sending an email to HardwareEstimate@pega.com.
  1. On a production system on which you want to run a Cassandra cluster, select at least three nodes.

    You can run multiple nodes on the same server provided that each node has a different IP address.
  2. In the sizing calculation tool, in the fields highlighted in red, provide the required information about records size for each of the following decision management services:

    1. In the DDS_Data_Sizing tab, provide information about Decision Data Store (DDS), such as the number of records and the average record key size.

    2. In the Delayed_Learning_Sizing tab, provide information about adaptive models delayed learning, such as the number of decision per minute and the average record key size.

      For more information, see the Delayed learning of adaptive models article on Pega Community.
    3. In the VBD_Sizing tab, provide information about business monitoring and reporting, such as the number of dimensions and measurements.

      For more information, see Visual Business Director planner.
    4. In the Model_Response_Sizing tab, provide information about collecting the responses to your adaptive models, such as the number of incoming responses in 24 hours.

      For more information, see Adaptive analytics.
  3. Calculate the required database size for your Cassandra cluster by summing up the values of the Total required disk space fields from each tab.

  4. Ensure that you have enough disk space to run the DDS data sets by dividing the database size that you calculated in step 3 by the number of available nodes and ensuring that the size of each node does not exceed 50% of the database size.

  5. If you use the cluster for simulations and data flow runs, increase processing speed by adding nodes to the cluster.

  • Configuring the consistency level

    Achieve the level of consistency that you want by deciding how many Cassandra nodes in a cluster must validate a write operation or respond to a read operation to declare success.

  • Configuring the replication factor

    Ensure reliability and fault tolerance by controlling how many data replicas you want to store across a Cassandra cluster.

Did you find this content helpful?

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.