The material in this wiki is with reference to versions of SAP Data Services lower than 4.2 Support Pack 2. Due to changes in product the content may not be fully valid for SAP Data Services 4.2 Support Pack 2 or above.
Use the following instructions to setup Hadoop.
- Download an Apache Hadoop distribution matching the version used in your vendor’s Hadoop distribution, such as Apache Hadoop 0.20.203 or Apache Hadoop 1.0.x. See Hadoop Releases.
- Unpack the Apache Hadoop distribution in a directory we’ll label HADOOP_HOME.
- Backup the HADOOP_HOME/conf directory.
- If you are using an Apache Hadoop 2 distribution, you'll find conf replaced by HADOOP_HOME/etc/hadoop and should use this instead.
- Copy the Hadoop conf directory from a node in your Hadoop cluster to HADOOP_HOME. This enables the machine to access your Hadoop cluster without being a part of it.
- Execute the command export HADOOP_HOME=<hadoop-install-dir>.
- CD to the Data Services installation’s bin folder.
- Execute the command source ./al_env.sh to configure the Data Services environment.
- CD to the $LINK_DIR/hadoop/bin directory where LINK_DIR has been setup by the al_env.sh script to point to the Data Services installation directory.
- Execute the command source ./hadoop_env.sh -e to configure Data Services for Hadoop.
- If you receive any errors about JAR files not being found, follow the Cannot read or write data using the HDFS instructions to modify the hadoop_env.sh script and repeat this step.
- Execute the command $HADOOP_HOME/bin/hadoop fs -ls /.
- You should see a list of directories present in your HDFS.
- If not, check whether the JAVA_HOME variable is hardcoded in the $HADOOP_HOME/conf/hadoop-env.sh file. Vendors sometimes modify this script from the Apache Hadoop version. JAVA_HOME has already been set by the Data Services al_env.sh script so you can try commenting out JAVA_HOME with a # character in the $HADOOP_HOME/bin/hadoop-env.sh file.
- If this still doesn’t work, try commenting out the HADOOP_OPTS variable in the $HADOOP_HOME/bin/hadoop-env.sh file.
- Optionally, execute the command ./hadoop_env.sh -c to configure Text Data Processing for Hadoop.
Use the following steps to setup Pig.
- Download an Apache Pig release matching the version used in your vendor’s Hadoop distribution, such as Apache Pig 0.9.2. See Pig Releases.
- Unpack the Apache Pig release in a directory.
- Execute the command export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATH.
- Execute the command pig
- You should be presented with a > prompt. Type quit and press Enter.
- If not, ensure the PATH contains the Pig bin directory.
If you are not going to be accessing data you may store in Hive, you don't need to configure Hive for the Data Services Job Server to use.
- Download an Apache Hive release matching the version used in your vendor’s Hadoop distribution, such as Apache Hive 0.9.0. See Hive Releases.
- Unpack the Apache Hive release in a directory.
- Backup the <my-path-to-hive>/conf directory.
- Copy the Hive conf directory from a node in your Hadoop cluster to <my-path-to-hive>. This enables the machine to access your Hive metastore.
- Execute the command export PATH=/<my-path-to-hive>/hive-n.n.n/bin:$PATH.
- Execute the command hive
- You should be presented with a hive> prompt. Type show databases; and press Enter. Type quit; and press Enter.
- If not, ensure the PATH contains the Hive bin directory.
- Copy the libhdfs* libraries from your CDH 4.x node’s /usr/lib64 directory to the /usr/lib64 directory on your clean Linux machine.
- Copy the /usr/lib/hadoop, /usr/lib/hadoop-hdfs, /usr/lib/hadoop-0.20-mapreduce, /usr/lib/pig, and /usr/lib/hive directories from one of your CDH 4.x nodes to the /usr/lib directory on your clean Linux machine.
- Export the /usr/lib/hadoop directory from step 1 as HADOOP_HOME.
- Add $HADOOP_HOME/bin to the $PATH.
- Try to execute ‘hadoop fs –ls /’. You should be able to browse your HDFS.
- Ensure that any JARs specified in the /usr/lib/hive/conf/hive-site.xml reside on the machine at the locations specified. If, for instance, you upgraded your CDH version, this file may reference updated JARs from the upgrade. Either copy the files to this machine or adjust the path to the files to point to your $HADOOP_HOME/lib or /usr/lib/hive/lib directories.
- Add /usr/lib/pig/bin and /usr/lib/hive/bin to the $PATH.
- Install the Data Services Job Server on the vanilla Linux machine.
- Ensure your $LINK_DIR/hadoop/bin/hadoop_env.sh script contains the following bold elements:
- #classes=`ls $HADOOP_HOME/lib/guava*.jar $HADOOP_HOME/lib/commons*.jar $HADOOP_HOME/client-0.20/*.jar`
- classes=`ls $HADOOP_HOME/client-0.20/*.jar`
Data Services Job Server
Use the $LINK_DIR/bin/svrcfg CLI to setup or re-start your Data Services Job Server to ensure it picks up the Hadoop environment once you've configured the following.
NOTE: The user that starts the Job Server must have read/write access to the HDFS.