Configuring Kafka to process real-time events
Pega Marketing™ 8.3 includes the default Event Stream service to process real-time events. If required, you can also take advantage of the high performance and scalability offered by Apache Kafka by configuring Pega Marketing to switch to an external Kafka cluster.
Events provide a mechanism for responding to real-time marketing opportunities. An event is initiated by external or internal systems and can trigger the execution of a campaign. For example, when a customer who has a checking account with UPlus Bank accesses the bank's ATM, the action is recognized by the Event Stream service, and triggers the campaign to which the event is mapped. As a result, the ATM screen shows the customer an offer for a new credit card, which UPlus Bank wants to advertise to the customer. By default, the event processing is handled by the Event Stream service.
Perform the following tasks to switch the processing to an external Kafka cluster.
Before you begin
Before you configure Kafka for events, set up the Kafka cluster. Make sure you configure the following parameters:
- Host names and port numbers
- Number of partitions - the recommended number is 50 or more partitions for each topic
- Replication factor - the recommended value is three nodes.
For more information, refer to Kafka documentation.
Configuring Kafka for events
To configure Kafka, first create a configuration instance for the Kafka cluster, and then create a new version of the existing CDHEventSource data set, so that you do not have to edit the data flow later.
- In Pega Platform, create a Kafka configuration instance which represents the Kafka cluster. For more information, see Creating a Kafka configuration instance.
- Create a Kafka data set.
- In Dev Studio, click Create > Data Model > Data Set.
- Enter CDHEventSource as the data set label and identifier.
- From the Type list, select Kafka.
- Provide the ruleset, Applies to class, and ruleset version of the data set. For the Applies to class, enter PegaMKT-Data-Event.
- In the Connection section, in the Kafka configuration instance field, select an existing Kafka cluster record, for example, MyKafkaInstance in the Data-Admin-Kafka class, or create a new one (for example, when no records are present) by clicking the Open icon.
- Check whether the Pega Platform is connected to the Kafka cluster by clicking Test connectivity. A message appears if the connection is successful.
- In the Partition Key(s) section, select the .CustomerID property to be used by the Kafka data set as a partitioning key.
- In the Record format section, leave the format as JSON.
- Click .
Restarting the real-time data flow
By default, the real-time data flow uses the built-in CDHEventSource data set. Restart the ProcessFromEventSource data flow to ensure that the new data set you created is used by Kafka.
- In the header of Dev Studio, click Configure > Decisioning > Decisions > Data Flows > Real-time processing.
- In the Action column for the ProcessFromEventSource data flow, click Manage.
- Select the Stop action from the drop-down menu and verify that the status changes to Stopped.
- Select the Restart action from the drop-down menu and verify that the status changes to Initializing.
- Click Refresh and verify that the status changes to In Progress.