You must configure each instance of the HDFS data set rule before it can read data from and save it to an external Apache Hadoop Distributed File System (HDFS).
The HDFS data set is optimized to support connections to one Apache Hadoop environment. When HDFS data sets connect to different Apache Hadoop environments in the single instance of a data flow rule, the data sets cannot use authenticated connections concurrently. If you need to use authenticated and non-authenticated connections at the same time, the HDFS data sets must use one Hadoop environment.
This group of files is based on the file within the original path, but also contains all of the files with the following pattern: fileName-XXXXX, where XXXXX are sequence numbers starting from 00000. This is a result of data flows saving records in batches. The save operation appends data to the existing HDFS data set without overwriting it. You can use * to match multiple files in a folder (for example, /folder/part-r-*).
 CSV
CSVIf your HDFS data set uses the CSV file format, you must specify the following properties for content parsing within the Pega Platform:
 Parquet
ParquetFor data set write operations, specify the algorithm that is used for file compression in the data set:
 CSV
CSVProperty mapping for the CSV format is based on the order of columns in the CSV file. For that reason, the order of the properties in the Properties mapping section must correspond to the order of columns in the CSV file.
 JSON
JSONIn auto-mapping mode, the column names from the JSON data file are used as Pega Platform properties. This mode supports the nested JSON structures that are directly mapped to Page and Page List properties in the data model of the class that the data set applies to.
 Parquet
ParquetTo create the mapping, Parquet utilizes properties that are defined in the data set class. You can map only the properties that are scalar and not inherited. If the property name matches a field name in the Parquet file, the property is populated with the corresponding data from the Parquet file.
You can generate properties from the Parquet file that do not exist in Pega Platform. When you generate missing properties, Pega Platform checks for unmapped columns in the data set, and creates the missing properties in the data set class for any unmapped columns.
To generate missing properties: