LinkedIn
Copied!

Table of Contents

Modifying Cassandra node routing policies

Version:

Only available versions of this content are shown in the dropdown

Maintain high performance and short write times by changing the default node routing policies that limit the Cassandra-Cassandra network activity.

By default, when Pega Platform connects to Cassandra, the DataStax token aware policy routes requests to Cassandra nodes. The goal of that policy is to always route requests to nodes that hold the requested data, which reduces the amount of Cassandra-to-Cassandra network activity through the following actions:

  • Calculating the token for the request by creating a murmur3 hash function of the partition key for the requested or written data.
  • Determining the list of potential nodes to which to send data by creating a group of nodes whose token range contains the token that you calculated.
  • Choosing one of the nodes in the list to which to send the request, with the local data center as the priority.
This policy is not suitable for range queries because they do not specify Cassandra partition keys. The Decision Data Store (DDS) uses range queries for browse operations, which are the source of batch data flow runs. As a result, all DDS data set browse queries are sent to all nodes, irrespective of whether the data for the range query exists on the node or not. For larger clusters of more than three nodes, this routing limitation might cause significant performance problems leading to Cassandra read timeouts.
  1. Enable the token range partitioner by setting the prconfig/dnode/dds_partitioner_class/default dynamic system setting to com.pega.dsm.dnode.impl.dataset.cassandra.TokenRangePartitioner.

    When the DDS data set browse operation is part of a data flow, the DDS data set breaks up the retrieved data into chunks, so that these chunk requests can be spread across the batch data flow nodes. By default, these chunks are defined as evenly split token ranges which do not take into account where the data resides. In a large cluster, a single token range may require data from multiple nodes. By configuring this DSS setting, you can ensure that no chunk range query requires data from more than one Cassandra node.
  2. Enable the extended token aware policy by setting the prconfig/dnode/cassandra_use_extended_token_aware_policy/default dynamic system setting to true.

    When a Cassandra range query runs, the extended token aware policy selects a token from the token range to determine the Cassandra node to which to send the request, which is effective when the token range partitioner is configured.
  3. Enable the additional latency aware routing policy by setting the prconfig/dnode/cassandra_latency_aware_policy/default dynamic system setting to true.

    In Cassandra clusters, individual node performance might vary significantly because of internal operations on the load (for example, repair or compaction). The latency aware routing policy is an additional DataStax client mechanism that can be loaded on top of the token aware policy to route queries away from slower nodes.
  4. Optional:

    To configure the additional latency aware routing policy parameters, configure the following dynamic system settings:

    1. Specify when the policy excludes a slow node from queries by setting the prconfig/dnode/cassandra_latency_aware_policy/exclusion_threshold/default dynamic system setting to a number that represents how many times slower the node must be from the fastest node to get excluded.

      If you set the exclusion threshold to 3, the policy excludes the nodes that are more than 3 times slower than the fastest node.
    2. Specify how the weight of older latencies decreases over time by setting the prconfig/dnode/cassandra_latency_aware_policy/scale/default dynamic system setting to a number of milliseconds.

    3. Specify how long the policy can exclude a node before retrying a query by setting the prconfig/dnode/cassandra_latency_aware_policy/retry_period/default dynamic system setting to a number of seconds.

    4. Specify how often the minimum average latency is recomputed by setting the prconfig/dnode/cassandra_latency_aware_policy/update_rate/default dynamic system setting to a number of milliseconds.

    5. Specify the minimum number of measurements per host to consider for the latency aware policy by setting the prconfig/dnode/cassandra_latency_aware_policy/min_measure/default dynamic system setting.

    For more information, see the Apache Cassandra documentation.
    Did you find this content helpful?

    Have a question? Get answers now.

    Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.