Your data set configuration depends on the data set type that you select.
You can create the following types of data sets:
Define the keys.
You can create this data set when you have at least one Decision Data Store node in the cluster.
This data set stages data for fast decisioning. You can use it to quickly access data by using a particular key.
When you create an instance of this data set, you need to define the keys.
To troubleshoot and optimize performance of the data set, you can trace its operations. For more information, see Tracing Decision Data Store operations.
The File data set reads data from a file in the CSV or JSON format that you upload and stores the content of the file in a compressed form in the pyFileSourcePreview clipboard property. This data set can be used as a source in Data Flow rules instances. You can use it to test data flows and strategies.
For configuration details, see Creating File data set.
The HBase data set reads and saves data from an external Apache HBase storage. This data set can be used as a source and destination in Data Flow rules instances.
For configuration details, see Creating HBase data set.
The HDFS data set reads and saves data from an external Apache Hadoop File System (HDFS). This data set can be used as a source and destination in Data Flow rules instances. It supports partitioning so you can create distributed runs with data flows. Because this data set does not support the Browse by key option, you cannot use it as a joined data set.
For configuration details, see Creating HDFS data set.
The Kafka data set is a high-throughput and low-latency platform for handling real-time data feeds that you can use as input for event strategies in Pega Platform. Kafka data sets are characterized by high performance and horizontal scalability in terms of event and message queueing. Kafka data sets can be partitioned to enable load distribution across the Kafka cluster. You can use a data flow that is distributed across multiple partitions of a Kafka data set to process streaming data.
For configuration details, see Creating a Kafka configuration instance and Creating a Kafka data set.
The Monte Carlo data set is a tool for generating any number of random data records for a variety of information types. When you create an instance of this data set, it is filled with varied and realistic-looking data. This data set can be used as a source in Data Flow rules instances. You can use it for testing purposes in the absence of real data.
For configuration details, see Creating Monte Carlo data set.
This type of data set allows you to process a continuous data stream of events (records).
Stream tab
The Stream tab contains details about the exposed services (REST and WebSocket). These exposed services handle a stream data set as a resource located at http://<HOST>:7003/stream/<DATA_SET_NAME>, for example: http://10.30.27.102:7003/stream/MyEventStream
You can use the Pega-provided load balancer to test how Data Flow rules that contain data sets of type Stream are distributed in multinode environments by specifying partitioning keys.
Use this feature only for testing purposes, in application environments whose Production level setting is set to 1, 2 ,or 3.
Settings tab
From the Settings tab you can set additional options for your stream data set. After saving the rule instance, you cannot change the settings.
Authentication
The REST and WebSocket endpoints are secured by using the Pega Platform common authentication scheme. Each post to the stream requires authenticating with your user name and password. By default the Enable basic authentication check box is selected.
In the Retention period field, you specify how long the data set keeps the records. The default value is 1 day.
In the Log file size field, you specify the size of the log files, between 10 MB and 50 MB. The default value is 10 MB.
The Visual Business Director data set stores data that you can view in the Visual Business Director planner to assess the success of your business strategy. To save data records in the Visual Business Director data set, you can, for example, set it as a destination of a data flow.
One instance of the Visual Business Director data set called Actuals is always present in the Data-pxStrategyResults class. This data set contains all the Interaction History records. For more information on Interaction History, see the PDN article Interaction History data model.
For configuration details, see Creating Visual Business Director data set.