Support Article
Pega stops polling Kafka Topic in a multi-broker Kafka cluster
SA-51249
Summary
When using Pega 7.3, the application sporadically stops polling the Kafka Topic in a multi-broker Kafka cluster. Hence, the messages in the topic do not reach the application although Kafka server instances are running.
Error Messages
Not Applicable
Steps to Reproduce
- Create a multi-node Kafka cluster.
- Configure three nodes as below.
For example,
broker.id=0 and listener port 9092
broker.id=1 and listener port 9093
broker.id=2 and listener port 9094
- Add the Kafka hosts nodes to the Data-Admin-Kafka rules instance in the same order and test the connectivity.
broker.id=0 and listener port 9092
broker.id=1 and listener port 9093
broker.id=2 and listener port 9094
- Create a topic on the Kafka side, which is replicated to all nodes of the cluster.
For example: replication factor of three (as in the above example). - Create a Kafka dataset. Select the topic configured on the Kafka side.
- Create a data flow that reads from the Kafka dataset and writes to the Decision Data Store (DDS) dataset.
- Publish messages to the topic on one of the nodes on the Kafka side. Realtime data flow processes the message if all the three nodes are running.
- Kill the Kafka node that is added as the first node on Data-Admin-Kafka instance on the application side.
- Publish the message to a topic on any of the other two nodes that are still running in the cluster. Sporadically, the application does not receive any message from Kafka, although the two other nodes are still alive.
If Kafka restarted the node, then the application receives messages that are published to the topic between the time the first node was brought down and restarted. However, the application sporadically does not process these messages. The application only retrieves new messages that are published to the topic after the first node is restarted. At times, the data flow node configured in the Decision > Infrastructure > Data flow services landing page must be restarted and new data flow run item created for the system to fetch new messages published to the Kafka Topic.
Root Cause
A defect in Pegasystems’ code or rules.
Resolution
Apply HFix-41127.
Published December 29, 2018 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.