Table of Contents

Article

Defining Hadoop records

You use the Hadoop record to define all connection details for an Apache Hadoop host in a single place. The record allows you to configure the connection settings for HDFS and HBase data sets. Both the HDFS and HBase services that are configured in a single Data-Admin-Hadoop instance use the same host.

Prerequisites

Before you can connect to Apache HBase or HDFS data storage, you have to import relevant client JAR files into the application container with the Pega 7 Platform. For more information, see JAR file dependencies for the HBase and HDFS data sets.

To enable client-server authentication based on the Kerberos authentication protocol, you must configure your environment. For more details, see the Kerberos documentation about the network authentication protocol.

Configuring Hadoop records

  1. Create an instance of the Hadoop record.
  2. Optional: Configure the HDFS connection.
  3. Optional: Configure the HBase connection.

HDFS connection details

The HDFS distributed file system stores data on commodity computers to provide a data cluster with high aggregate bandwidth. The HDFS section on the Pega 7 Platform allows you to configure the following basic HDFS connection properties:

  • User name - The user who is authenticated in the HDFS system. This user must have access rights to the root folder "/" for the connection test to succeed.
  • Port - The HDFS connection port. The default port is 8020.

As part of advanced HDFS configuration, you can configure the following settings:

  • NameNode - Keeps the directory tree of all files in the file system. Additionally, the NameNode server tracks where the data is kept across the cluster.
  • Response timeout - The maximum time to wait for the server response (expressed in milliseconds).

Optionally, you can enable client-server authentication that is based on the Kerberos authentication protocol.

HDFS configuration

HDFS connection configuration

After setting up all the required properties for HDFS, you can test the connection. If the connection was set up correctly, a message is displayed.

HBase connection details

The HBase distributed database is written in Java and provides column-oriented database management capabilities for Hadoop. The HBase section allows you to configure the following basic HBase connection properties:

  • Client - Allows selecting Java or REST HBase client implementation. To use the REST implementation, an HBase server must be running.
  • Port - The HBase connection port. For the Java client, the port is for the ZooKeeper service (the default port is 2181), and for the REST client, the port is the one on which the REST gateway is set up (the default port is 20550).

As part of the advanced HBase configuration, you can configure the following settings:

  • ZooKeeper host - The custom ZooKeeper host, different from the one defined in the basic connection properties. When advanced configuration is disabled, the basic configuration host is used as the HBase ZooKeeper host.
  • Response timeout - The maximum time to wait for the server response (expressed in milliseconds).

Additionally, you can enable client-server authentication that is based on the Kerberos authentication protocol.. This option is available only if you select the Java-based HBase database implementation.

HBase connection

HBase connection configuration

After setting up all the required properties for HBase, you can test the connection. If the connection was set up correctly, a message is displayed.

Tags:

Published January 15, 2016 — Updated September 6, 2017

Related Content

Have a question? Get answers now.

Visit the Pega Support Community to ask questions, engage in discussions, and help others.