Page tree
Skip to end of metadata
Go to start of metadata

Wondering how to get started with SAP HANA Smart Data Integration and Smart Data Quality? 

Then you are at the right place to explore information on getting started with SAP HANA Smart Data Integrator (SDI) and SAP HANA Smart Data Quality (SDQ).

There are three major styles of data integration.  They are bulk and batch, real-time replication, and data virtualization. SAP HANA Smart Data Integration supports all three.  SDI provides real-time data provisioning, bulk data movement, and federation capabilities.  It includes two components the Data Provisioning Server on the SAP HANA database server, and the Data Provisioning Agent as an intermediary between SAP HANA and the external sources.  Data Provisioning Agent hosts data provisioning adapters. It enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. As it can be installed on a separate host from the data source, it can be used in secure enterprise network scenarios.  Replication tasks and flowgraphs let you set up batch or real-time data replication and transformation scenarios.  SAP HANA Smart Data Quality also requires the script server service running on the SAP HANA database server.  

SAP HANA Smart Data Integrator and SAP HANA Smart Data Quality have a number of videos created in the HANA Academy to assist with many of the common installation and configuration of the product.  These videos along with a number of blogs are available to assist with getting started with SAP HANA Smart Data Integration and Smart Data Quality. 

Overview

Understand Architecture and Involved HANA Objects

SDI Component hierarchy

  • Each Hana instance has a Data Provisioning Server (DPServer)
    • The DPServer handles the communication between HANA and the Agents
  • Each DP Server has a Data Provisioning Agent connected (DPAgent)
    • The DPAgent is usually close to the source system
  • Each Adapter is connected to the Sources
    • The Adapter acts as a bridge or translation device between the source and HANA
  • Each Source has many tables (or procedures)
    • The remote source object of HANA holds all information required to connect to a specific source system. 
  • Each source table is pointed to by virtual tables
    • The virtual table represents the remote table in HANA.  All metadata needed by HANA is copied for performance reasons, but not the data itself. 

Additional objects

  • Flowgraph editor for designing the dataflows with transformations.  
  • The Replication task editor for mass replication.

Getting Started

General Steps - SDI

Regardless if SAP HANA Smart Data Integration is used in the cloud or on premise there are general steps that need to be completed to successfully enable communication between the source system and the target HANA system in order to start creating replication tasks or flowgraphs to move the data. The general steps to complete are:

  • Install the DPAgent
  • Starting the DPAgent
  • Connect the DPAgent to HANA
  • Registering the Agent
  • Registering the Adapter

Setting-up the Data Provisioning Agent to Connect SAP HANA On-premise to SAP HANA Cloud - is a video that goes through each of the steps using the command line configuration tool.

General Steps - SDQ

There are additional steps required for setting up SAP HANA Smart Data Quality.  These steps need to be completed in order to use the transformation nodes of Cleanse, Match, and Geocoding. The Installation and Configuration Guide contains additional information.

  • Enable the scriptserver service.
    • The SDQ nodes require the scriptserver service to run.
  • Download and deploy the smart data quality directories.
    • If the referential data is not available the process does not fail, the Cleanse node performs parsing, but doesn’t perform assignment. 
  • Configure the operation cache.
    • The operation cache holds operation instances for Global Address Cleanse, Universal Data Cleanse, Geocode, and Type Identifier (TID), which are initialized and ready for use during task plan execution.
  • Set monitoring alerts to inform you when the directories expire.
    • Without monitoring, it is possible for the referential data to expire and processing to change from assignment to parsing without the users being aware.
    • Best practice is always to download and install the referential data for the countries being processed as soon as it becomes available to ensure the best possible assignment. 
    • 2281775 - What is the release schedule for Address Directories & Reference Data?

Next Step

Replicate Data

  • Create a Replication task to set up batch or real-time data replication
    • Replication tasks can be configured for Initial load, Initial + Realtime, Realtime only, with and without data transfer and with and without the schema changes, depending on your business needs.  
  • Create a Flowgraph to transform the data.
    • SDI comes with full features to transform your data in the nodes available for transformations. 

See Also: 

  • No labels