Partition keys in Stream Data Set rules
Beginning with Pega7.3, you can define a set of partition keys when you create a Data Set rule of type Stream. Setting partition keys in a data set is useful for analyzing data across multiple nodes and helps you ensure that all related records are grouped together.
Define partitioning only for testing purposes, that is, in application environments in which the Production level system setting is set to 1, 2, or 3. If you change the Production level setting to 4 or 5, any data set of type Stream that has at least one property defined as a partition key stops being distributed across multiple nodes. In production-level applications (above level 3), you can distribute the processing of data from stream data sets across multiple nodes only by using your own custom setup (for example, by sending load-balancing requests to the node cluster, and so on).
Any change in the production level takes effect after you restart the system.
Setting the production level in an application
You can use the properties defined in the Applies To class of the Data Set rule as partition keys. Additionally, if the Data Flow rule (for which the stream data set is the source) references an Event Strategy rule, you can define only a single partition key. That partition key must be the same as the event key that you defined in the Real-time Data shape on the Event Strategy form.
Defining partition keys for an event stream
Active data flows that reference stream data sets that have at least one partition key defined continue processing when the node topology changes, for example, if a node fails or a node is removed from the cluster. Such a data flow adjusts to the change in the number of Data Flow service nodes, but the data that was not yet processed on the failed or disconnected node is lost.
For more information, see Defining partition keys for stream data sets.
Published May 12, 2017 — Updated August 31, 2018