Table of Contents

Article

Monitoring Kafka

Use Java Management Extensions (JMX) to gather Kafka metrics. JMX metrics are always available when Kafka is running.

The default JMX port is 9999. To change it, edit the following entry in the prconfig.xml file:

<env name="dsm/services/stream/pyJmxPort" value="portNumber" />

where portNumber is the custom port number. For more information, see Modifying the prconfig.xml file.

You can monitor the following metrics:

AreaMetricDescription
DiskFree disk spaceTotal free storage space, calculated as the sum of free space in all storages where Kafka data is located.
Disk usageTotal used storage space, calculated as the sum of Kafka data directories.
PartitionsTotalNumber of partitions on this Kafka broker. The numbers should be similar across all brokers.
Under-replicatedPartitions where the number of in-sync replicas is lower than the total number of replicas. An alert is registered if the value is greater than 0.
OfflineNumber of partitions which do not have an active leader and which are therefore not writable or readable. An alert is registered if the value is greater than 0.
LeadersNumber of leaders on this Kafka broker. The numbers should be similar across all brokers.
Incoming byte rate1 minuteIncoming byte rate for the last minute.
5 minuteIncoming byte rate for the last 5 minutes.
15 minuteIncoming byte rate for the last 15 minutes.
MeanAggregated incoming byte rate.
Outgoing byte rate1 minuteOutgoing byte rate for the last minute.
5 minuteOutgoing byte rate for the last 5 minutes.
15 minuteOutgoing byte rate for the last 15 minutes.
MeanAggregated outgoing byte rate.
Incoming message rate1 minuteIncoming message rate for the last minute.
5 minuteIncoming message rate for the last 5 minutes.
15 minuteIncoming message rate for the last 15 minutes.
MeanAggregated incoming message rate.
ProcessorsNetwork processors idle timeThe average fraction of time the network processors are idle. The value should be between 0 and 1, ideally greater than 0.3.
Request handler threads idle timeThe average fraction of time the request handler threads are idle. The value should be between 0 and 1, ideally greater than 0.3.
MetricsReplication max lagMaximum lag in messages between the follower and leader replicas.
Is controllerIf the broker is an active controller, the value of this metric is 1. The aggregated sum across all brokers in the cluster should always be 1, because there must be exactly one controller per cluster.

To view the main outline for this article, see Kafka as a streaming service.

Published February 21, 2019 — Updated March 6, 2019

Related Content

Have a question? Get answers now.

Visit the Pega Support Community to ask questions, engage in discussions, and help others.