This documentation site is for previous versions.

Visit our new documentation site for current releases.

Configuring the Stream service

Updated on July 5, 2022

This content applies only to On-premises and Client-managed cloud environments

Configure the Stream service to ingest, route, and deliver high volumes of data such as web clicks, transactions, sensor data, and customer interaction history.

Distribution and replication of the stream data records ensure scalability and fault tolerance of the Stream service. The service runs as a cluster on one or more servers.

For additional guidelines regarding throughput, disk space requirements, and compression, see Best practices for Stream service configuration.

Stream node type

When planning your deployment, assign the Stream node type to at least two and at most four nodes in one Pega Platform cluster.

If you plan to have more than four Stream nodes, contact Global Customer Support to assist with your deployment.

For more information, see Assigning node types to nodes for on-premises environments.

Node identification

Each Pega Platform node is identified with a Node ID that must be unique in the cluster. If the same Node ID is already used in the cluster, the node fails to start.

Use this setting to more easily identify nodes and their purposes at a glance. A node ID is generated by default based on certain system setting values. However, as a best practice, set the node ID manually to reflect the node’s intended purpose. To set the node ID, use a JVM argument as shown in the following example:

-Didentification.nodeid=stream-node-1

Data replication

Stream service replicates every record across a configurable number of servers. This replication allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.

By default, the Stream service keeps two replicas of each record. In case you increase the number of Stream nodes from two to three or four, make sure you change the data replication setting to match the number of Stream nodes. You can do it by using prconfig on every Pega Platform node, or by using the following dynamic system setting:

Owning Ruleset: Pega-Engine
Setting Purpose: prconfig/dsm/services/stream/pyReplicationFactor/default
Value: <number_of_stream_nodes>

Data files location

By default, the Stream service stores its data in the java_ee_server_root/kafka-data folder. Change this location to a folder that you can monitor and secure against accidental data deletion.

Important: Do not use network attached storage or shared folders to store your stream data.

To change the default directory for a single server, in the prconfig.xml file, add the following entry:

<env name="dsm/services/stream/pyBaseLogPath" value="/data/kafka-data"/>

To change the default directory for all servers in the cluster, create a dynamic system setting with the following options:

Owning Ruleset: Pega-Engine
Setting Purpose: prconfig/dsm/services/stream/pyBaseLogPath/default
Value: /data/kafka-data

Ensure that you have at least 100 GB of disk space available to accommodate standard background processing activities.

Apache Kafka distribution location

When the Stream service is enabled in Pega Platform, the Apache Kafka distribution is unpacked in the following directory: java_ee_server_root/kafka-version

If you need to change the default location because it is secured against writing operations, you can do it in one of the following ways:

In the prconfig.xml file, add the following entry:

<env name="dsm/services/stream/pyUnpackBasePath" value="/opt/kafka" />

Create a dynamic system setting with the following options:
Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyUnpackBasePath/default
Value
/opt/kafka

Operating system

Deploy the Stream service on Linux or any other Unix system.

Running Stream nodes on Windows might cause issues, and is not recommended in production environments.

The Stream service uses file descriptors for data files and open connections. Allow a limit of at least 100000 file descriptors. With a low descriptors count, the count limit might be exceeded causing the Stream service to fail. Check your operating system documentation on how to raise the ulimit.

Clock synchronization

Ensure that clocks on Stream nodes do not drift away and stay synchronized within a 30 seconds window. A very effective method of synchronizing clocks across all Pega Platform nodes is by using NTP.

Multiple JVMs on a single host

Do not run multiple Stream service JVMs on a single host. This reduces overall cluster resiliency and data availability in case the entire host fails.

However, in case such setup is required, you can do it by configuring dedicated, non-conflicting ports, for each Stream service JVM.

The Stream service uses three IP address and port pairs for internal communication. Assign a distinct set of ports for each JVM on a single host.

<!-- IP and port for communication between Pega nodes -->
<env name="dsm/services/stream/pyBrokerAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyBrokerPort" value="9092"/>

<!-- IP and port for configuration management -->
<env name="dsm/services/stream/pyKeeperAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyKeeperPort" value="2181"/>

<!-- Port for local Kafka management. Kafka JMX always runs on localhost --
>
<env name="dsm/services/stream/pyJmxPort" value="9999"/>

<!-- Port for HTTP streaming -->
<env name="dsm/services/stream/pyPort" value="7003"/>

JVM heap size

It is unlikely that you need to increase default JVM heap settings for your Stream service. However, if you need to do so, use the following settings:

Add an entry in the prconfig.xml file, for example:

<env name="dsm/services/stream/pyHeapOptions" value="-Xmx4G -Xms4G" />

Create a dynamic system setting with the following options:
Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyHeapOptions/default
Value
For example: -Xmx4G -Xms4G

Garbage collector logs

The garbage collector removes discarded objects from the heap to free up allocation space. Garbage collector logs provide information about the memory cleaning process and help in identifying performance issues. By using the following settings, you can configure the name, count, and size for the log files.

Add an entry in the prconfig.xml file, for example:

<env name="dsm/services/stream/pyGcLogOptions" value="-Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400"/>

Create a dynamic system setting with the following options:
Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/services/stream/pyGcLogOptions/default
Value
For example: -Xlog:gc*:file=kafkaServer-gc.log:time,tags:filecount=10,filesize=102400

JVM tuning

You can tune the garbage collection performance of the JVM. In the following example, the default Garbage-First (G1) garbage collector is selected and the maximum pause time for garbage collection is set to 20 milliseconds.

Add an entry in the prconfig.xml file, for example:

<env name="dsm/services/stream/pyJvmPerformanceOptions" value="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"/>

Create a dynamic system setting with the following options:
Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyJvmPerformanceOptions/default
Value
For example: -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20"

Multiple availability zones

Spread your Stream nodes across multiple availability zones. To distribute data replicas evenly across availability zones, use the following settings to configure AZ names:

Add an entry in the prconfig.xml file, for example:

<env name="dsm/services/stream/server_properties/broker.rack" value="AZ-1" />

Create a dynamic system setting with the following options:
Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/server_properties/broker.rack/default
Value
For example: AZ-1

General settings in the server.properties file

You can set the general settings in the server.properties file by using the following format:

<env name="dsm/services/stream/server_properties/property" value="value"/>

where:

property is the name of the property that you want to modify.
value is the value of that property.

Previous topic Deploying and operating the Stream service
Next topic Best practices for Stream service configuration

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Configuring the Stream service

Stream node type

Node identification

Data replication

Data files location

Apache Kafka distribution location

Operating system

Clock synchronization

Multiple JVMs on a single host

JVM heap size

Garbage collector logs

JVM tuning

Multiple availability zones

General settings in the server.properties file

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Stream node type

Node identification

Data replication

Data files location

Apache Kafka distribution location

Operating system

Clock synchronization

Multiple JVMs on a single host

JVM heap size

Garbage collector logs

JVM tuning

Multiple availability zones

General settings in the server.properties file

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.