Support Article
Cassandra partitioning leads to long 'tail' where system is slow
SA-30283
Summary
Due to the way that Cassandra partitioning work, user is finding that at the end of each data flow execution, many of the threads run out of work to do, leaving the remaining threads to complete all of the outstanding work.
This leads to a long period of time where only one or two threads are running and all other threads are idle.
This causes some Cassandra partitions to be very small, and some very large. When large partitions are the last ones to process it may leave a long tail as described in the issue.
This can be up to three hours where the system is hardly doing anything but the next data flow cannot start until the current data flow has finished. This is particularly noticeable in one particular data flow.
Error Messages
Not applicable
Steps to Reproduce
Run a data flow.
Root Cause
The issue is caused by the nature of Cassandra token distribution as described here in New token allocation algorithm in Cassandra 3.0
This causes some Cassandra partitions to be very small, and some very large. When large partitions are the last ones to process it may leave a long tail as described in the issue.
Resolution
Apply HFix-30421.
Published November 18, 2016 - Updated December 2, 2021
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.