Sizing a Cassandra cluster
Achieve high performance in terms of data replication and consistency by
estimating the optimal database size to run a Cassandra cluster.
Before you begin: Obtain the sizing calculation tool by sending an email to
[email protected].
-
On a production system on which you want to run a Cassandra cluster, select at
least three nodes.
Note: You can run multiple nodes on the same server provided that each node has a different IP address.
-
In the sizing calculation tool, in the fields highlighted in red, provide the
required information about records size for each of the following decision management services:
-
In the DDS_Data_Sizing tab, provide information
about Decision Data Store (DDS), such as the number of records and the
average record key size.
For more information, see Configuring the Decision Data Store service.
-
In the Delayed_Learning_Sizing tab, provide
information about adaptive models delayed learning, such as the number
of decision per minute and the average record key size.
For more information, see the Delayed learning of adaptive models article on Pega Community.
-
In the VBD_Sizing tab, provide information about
business monitoring and reporting, such as the number of dimensions and
measurements.
For more information, see Visual Business Director planner.
-
In the Model_Response_Sizing tab, provide
information about collecting the responses to your adaptive models, such
as the number of incoming responses in 24 hours.
For more information, see Adaptive analytics.
-
In the DDS_Data_Sizing tab, provide information
about Decision Data Store (DDS), such as the number of records and the
average record key size.
- Calculate the required database size for your Cassandra cluster by summing up the values of the Total required disk space fields from each tab.
- Ensure that you have enough disk space to run the DDS data sets by dividing the database size that you calculated in step 3 by the number of available nodes and ensuring that the size of each node does not exceed 50% of the database size.
- If you use the cluster for simulations and data flow runs, increase processing speed by adding nodes to the cluster.