Skip to end of metadata
Go to start of metadata

When you have a problem running jobs using Data Services to interact with your Hadoop environment, you can download a validation script which will run a series of checks to help isolate the problem.

  • Download the validation script.
  • Copy the ZIP file to the machine where the Data Services Job Server is installed.
  • Unzip the file and give the script execute privileges.
  • Type ./validate_env.sh and press 'Y' to begin validation.
  • If any of the checks fail, re-run the script with -l log_file_name specified which will contain more information on why a check failed.

All of the checks should pass with green 'Pass' status messages after running. If any of the checks fail, take the following corrective action and re-run the script.

Check

Corrective Action

Data Services Environment Set

Ensure you have executed the command source <Data Services installation>/bin/al_env.sh.

Hadoop Environment Set

Ensure the $HADOOP_HOME environment variable is set and that you have executed the command source <Data Services installation>/hadoop/bin/hadoop_env.sh -e

Hadoop in Path

Ensure the hadoop script is available in the $PATH.

Pig in Path

Ensure the pig script is available in the $PATH.

Hive in Path

Ensure the hive script is available in the $PATH.

HDFS Write

Check your Hadoop configuration, HDFS health, and permissions.

HDFS Read

Check your Hadoop configuration, HDFS health, and permissions.

PIG Load

Check your Pig configuration and the validation script log file for details.

PIG Store

Check your Pig configuration and the validation script log file for details.

HDFS Delete

Check your Hadoop configuration, HDFS health, and permissions.

Hive Create Database

Check your Hive configuration and the validation script log file for details.

Hive Create Table

Check your Hive configuration and the validation script log file for details.

Hive Load Table

Check your Hive configuration and the validation script log file for details.

Hive Select

Check your Hive configuration and the validation script log file for details.

Hive Drop Table

Check your Hive configuration and the validation script log file for details.

Hive Drop Database

Check your Hive configuration and the validation script log file for details.

HDFS Library in Library Path

Ensure the 64-bit libhdfs.so library is in the $LD_LIBRARY_PATH environment variable.

HDFS Library Dependencies in Library Path

Ensure all dependencies of the 64-bit libhdfs.so library are in the $LD_LIBRARY_PATH environment variable. ldd libhdfs.so will indicate what dependencies are missing.

Class Path Contains Necessary JARs

The $CLASSPATH environment variable should contain the Hadoop configuration directory as well as the Hadoop core and several support JARs.

At a minimum:
$HADOOP_HOME/conf
$HADOOP_HOME/hadoop-core-xxx.jar
$HADOOP_HOME/lib/commons-logging-xxx.jar

And for some Hadoop distributions, such as Cloudera:
$HADOOP_HOME/lib/guava-xxx.jar

If any of these are located somewhere other than $HADOOP_HOME, then the $CLASSPATH should be altered accordingly.

  • No labels

1 Comment

  1. Anonymous

    Hi Justin,

    We're using Data Services 4.1, and want to connect with Cloudera 4.5.2. Cloudera 4 is compatible with Apache Hadoop 2.x.x, whereas the hadoop_env.sh script and hadoop library included in Data Services 4.1 are compatible with Apache Hadoop 1.x.x. The validation script provided here only works with Apache Hadoop 1.x.x without modifying. it doesn't work with Apache Hadoop 0.20.xxx or 2.x.x.

    We are having problem connection Data Services 4.1 with Cloudera 4 because of incompatible version between DS and Cloudera. Have you encountered similar issue? If you have any suggestions to solve this problem, it'll be very much appreciated.

    Regards,

    Van