LinkedIn
Copied!

Table of Contents

Monitoring Kafka

Use Java Management Extensions (JMX) to gather Kafka metrics. JMX metrics are always available when Kafka is running.

The default JMX port is 9999. To change it, edit the following entry in the prconfig.xml file:

<env name="dsm/services/stream/pyJmxPort" value="portNumber" />

where portNumber is the custom port number. For more information, see Modifying the prconfig.xml file.

You can monitor the following metrics:

Area Metric Description
Disk Free disk space Total free storage space, calculated as the sum of free space in all storages where Kafka data is located.
Disk usage Total used storage space, calculated as the sum of Kafka data directories.
Partitions Total Number of partitions on this Kafka broker. The numbers should be similar across all brokers.
Under-replicated Partitions where the number of in-sync replicas is lower than the total number of replicas. An alert is registered if the value is greater than 0.
Offline Number of partitions which do not have an active leader and which are therefore not writable or readable. An alert is registered if the value is greater than 0.
Leaders Number of leaders on this Kafka broker. The numbers should be similar across all brokers.
Incoming byte rate 1 minute Incoming byte rate for the last minute.
5 minute Incoming byte rate for the last 5 minutes.
15 minute Incoming byte rate for the last 15 minutes.
Mean Aggregated incoming byte rate.
Outgoing byte rate 1 minute Outgoing byte rate for the last minute.
5 minute Outgoing byte rate for the last 5 minutes.
15 minute Outgoing byte rate for the last 15 minutes.
Mean Aggregated outgoing byte rate.
Incoming message rate 1 minute Incoming message rate for the last minute.
5 minute Incoming message rate for the last 5 minutes.
15 minute Incoming message rate for the last 15 minutes.
Mean Aggregated incoming message rate.
Processors Network processors idle time The average fraction of time the network processors are idle. The value should be between 0 and 1, ideally greater than 0.3.
Request handler threads idle time The average fraction of time the request handler threads are idle. The value should be between 0 and 1, ideally greater than 0.3.
Metrics Replication max lag Maximum lag in messages between the follower and leader replicas.
Is controller If the broker is an active controller, the value of this metric is 1. The aggregated sum across all brokers in the cluster should always be 1, because there must be exactly one controller per cluster.

To view the main outline for this article, see Kafka as a streaming service.

Suggest Edit

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.