Registration

Dear SAP Community Member,
In order to fully benefit from what the SAP Community has to offer, please register at:
http://scn.sap.com
Thank you,
The SAP Community team.
Skip to end of metadata
Go to start of metadata
 The material in this wiki is with reference to versions of SAP Data Services lower than 4.2 Support Pack 2. Due to changes in product the content may not be fully valid for SAP Data Services 4.2 Support Pack 2 or above.
Summary

These instructions and troubleshooting tips will guide you through setting up Data Services to work with your Hadoop distribution. They augment the Hadoop chapter from the Data Services Reference Guide.

Configuring

There are 2 approaches to configuring Data Services to work with your Hadoop distribution. The first entails setting up Data Services on a node in your Hadoop cluster. The second involves setting up a machine with Data Services and Hadoop that is not in your Hadoop cluster. In both scenarios Data Services must be installed on a Linux machine.

Testing

The following jobs should be created in the Data Services Designer to test whether Data Services and Hadoop are setup properly. If the jobs don't succeed, consult the troubleshooting section.

Note: This assumes you have already setup a Data Services Designer on Windows to use the same repository as your Linux Data Services Job Server so these jobs will be executed using it and it can communicate with your Hadoop cluster.

Troubleshooting

If you have problems after setting up your environment when running jobs that interact with your Hadoop cluster, start with the validate environment script to help isolate any problem and take corrective action.

9 Comments

  1. Hi Justin, brilliant article. We are planning a POC for a customer and will test the Hadoop integration with Data Services; so this article will be very helpfull.

    One question is still not clear to me and I couldn't find it in the manuals: will simple string manipulation functions be pushed down to Hadoop? Let's say we are reading a file from HDFS and immediately after the the source file format we have a query transform using some simple string manipulations.

    Thanks

    Arne

  2. Former Member

    Arne,

         Thanks for reaching out to me about using the Hadoop support in SAP Data Services. I didn't see your comment until today or I would have messaged you earlier.

         I was curious if you have a use case for your customer that wants to do a POC around SAP Data Servies iteracting with Hadoop and if you know the distribution/version that will be used?

         You can do some string manipulations in a Query transform and have it pushed down to Hadoop.

    Thanks.

    Justin-

  3. Hi Justin

    Thanks for replying. Yes, we are planning a POC with a customer. We want to run text data processing on emails and do sentiment analysis and automatic business process recognition with these emails.

    Another requirement is to automatically anonymize sensitive data such as names. This is why a push down of string manipulations to Hadoop might be very helpful.

    We have a Hadoop cluster on Linux, but Data Services 4.1 is installed on a Windows 2008 server. As far as I understand we needed to install Hadoop (and PIG) on the Windows server as well. Do you happen to have experience with such a setup?

    Arne

  4. Former Member

    Arne,

         I'd love to explore your requirements in more depth but exchanging comments doesn't seem that useful. How about a meeting? Please send me an email at justin.martinson@sap.com and we can sync up.

    Thanks.

    Justin-

  5. Hi Justin,

    Thanks for this article, (smile)

    We are working on integration of Hadoop and Data services 4.1 to carry out PoC on user cases.

    We have successfully configured DS 4.1 and Hadoop 1.2.1 on RHEL 6, but facing problem with integration, HDFS file format is not showing up in DS 4.1 designer.

    we have sourced hadoop env and DS env. properly still facing problem, is there any other task missed ? please guide.

    Regards,

    Avinash

     

     

    1. Former Member

      Hi Avinash,

           I'm not sure I follow what you mean by "HDFS file format is not showing up". Why don't we have a call where we can clarify this statement and review your setup. Please reach out to me at justin.martinson@sap.com and we can schedule a meeting.

      Thanks.

      Justin-

  6. Former Member

    Hi Justin,

     

    This article is very supportive since I’m going to meet a customer who apparently is curious about Hadoop connectivity

    But unfortunately I am a very novice on Hadoop technology and don’t have applicable environment so that I can’t try to define such connection form DS to Hadoop,on a actual environment.

    If you could kindly share with me some screen shots while defining this connectivity and/or importing metadata from Hadoop data, it will be much helpful both for me and the customer to understand DS capability with concrete image.

    Can I ask you such favor?

     

    Thanks,

     

    Ryo

  7. Former Member

    Hi Justin,

    Please can you share me how to integrate with SAP HANA and HADOOP ,.we have to do planning for  POC in my current project.bcz im not aware of setting up hadoop+hana.its very useful for me.and i hope ill get back from your side.

  8. Great Information.

    Would be nice if we could confirm any differences with DS 4.2 SP2+.

    We are doing a POC RFC table read SAP ECC to Hadoop Cloudera HDSF.

     DS Hadoop Configuration issues/fixes

    1. Need to update DS Java from SAP to 1.8 form 1.6 per SAP note 2275588 - How to update Data Services Adapter JRE to 1.8  
    2. Need Java "unlimited Strength" files from JAVA vender per SAP note 1426759 - Downloading the JCE Policy Files from Oracle or IBM
    3. Need to enable Kerberos support per SAP note  2197619 - Kerberos authentication support for Hadoop/Hive
    4. DS job server user must do kinit every time or create Keytab, then you need to generate new keytab every time the DS user password changes (we are required to change every 3 months).
    5. Need missing shared library per SAP note 2158337 - SAP Data Services job fails in HDP 2.2 Hadoop cluster with error: "Failed to initialize HDFS"

    So far:

    1. Can send hdfs dfs -put command successfully from DS job script.
    2. Can read a HDFS file from the job HDFS File Format Data Preview.
    3. Get error running job to HDFS FF " <HDFS Failed Connect to <NameNode host>>, This SAP note did not fix it 2099945 - Error loading Hadoop HDFS - Data Services
    4. Will open customer incident.

    We may give up & just read ECC & write Flat File to Linux directory then use the Hadoop tools (hdfs dfs -put) to load to Hadoop.

    Another company had performance issue, so they use SAPDS from SAP ECC to Oracle, then use Hadoop tools from Oracle to Hadoop.

    We have not tried to configured SAPDS for Hive, may give that a quick test & update this comment with the results.