Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Best practices for Stream service configuration

Updated on December 13, 2023
Note: This article applies to Pega Platform™ versions 8.1-8.3. For later Pega Platform versions, see Best practices for Stream service configuration.

Follow these guidelines for the recommended deployment configuration.

Expected throughput

Data throughput depends on the number of nodes, CPUs, and partitions, as well as the replication factor and bandwidth.

Review the results of tests on three running stream service nodes on machines with the following configuration:

  • CPU cores: 2
  • Memory (GB): 8
  • Bandwidth (Mbps): 450
  • Number of partitions: 20
  • Replication factor: 2

The following table shows the test results for writing messages to the stream (producer):

RecordsRecord
size
(bytes)
ThreadsThroughput
(rec/sec)
Average
latency
(ms)
MB/sec
50000001001836752.28.4
  5172276.111.917.2
  10216682.840.421.7
 100132967.75.133
  553033.748.653
  1049861.7174.349.8
 100176812.12.47.7
  5165317.413.316.5
  10203216.346.119.38
 1000135865.74.835.9
  552456.741.952.4
  1050266.1158.850.3

Producer throughput - test results

The following table presents the test results for reading messages from the stream (consumer):

Note: For the consumer, the replication factor is not important because the consumer reads from the leading partition.
RecordsRecord size
(bytes)
ThreadsThroughput
(rec/sec)
MB/sec
50000001001120673.812
  515046515
  10143395.314.3
 1000154128.454.1
  555903.355.9
  1054674.554.7

Consumer throughput - test results

Disk space requirements

By default, the Kafka cluster stores data for seven days. You can change that time by overriding the log.retention.hours property.

Example:

Your goal is to process 100,000 messages per second, 500 bytes each, and to keep messages on the disk for one day. The replication factor is set to 2.

The expected throughput is 50MB/sec:

  • 3GB is used in one minute for a single copy of the data.
  • 6GB of disk space is used in one minute due to the replication factor of 2.
  • The total throughput is 360GB in one hour and 8.64TB in one day.
  • Apart from your data, the Kafka cluster uses additional disk space for internal data (around 10% of the data size).

In that sample scenario, the total minimal disk size should be 9.5TB.

Compression

Depending on your needs, you can choose data compression using one of the algorithms that Kafka supports: GZIP, Snappy, or LZ4. Consider the following aspects:

  • GZIP requires less bandwidth and disk space, but this algorithm might not saturate your network while the maximum throughput is reached.
  • Snappy is much faster than GZIP, but the compression ratio is low, which means that throughput might be limited when the maximum network capacity is reached.
  • LZ4 maximizes the performance.

Review the following table and diagram with throughput and bandwidth usage per codec:

CodecThroughput %Bandwidth %
none100100
gzip147.55.2
snappy116.364.1
lz4188.934.5

Throughput and bandwidth per codec (%)

Thumbnail

Throughput and bandwidth metrics per codec (%)

Thumbnail

Disk usage metrics per codec

To define the compression algorithm, update the compression.type property. By default, no compression is applied.

    Data folder

    By default, the stream service keeps its data in the current working directory. For Apache Tomcat, the working directory is located in: <your_tomcat_folder>/kafka-data

    Ensure that you set up that path to the location with enough disk space.

     

    To view the main outline for this article, see Kafka as a streaming service.

    Have a question? Get answers now.

    Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

    Did you find this content helpful?

    Want to help us improve this content?

    We'd prefer it if you saw us at our best.

    Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

    Close Deprecation Notice
    Contact us