Wondering how to get started with SAP HANA Smart Data Integration and Smart Data Quality?
Then you are at the right place to explore information on getting started with SAP HANA Smart Data Integrator (SDI) and SAP HANA Smart Data Quality (SDQ).
There are three major styles of data integration. They are bulk and batch, real-time replication, and data virtualization. SAP HANA Smart Data Integration supports all three. SDI provides real-time data provisioning, bulk data movement, and federation capabilities. It includes two components the Data Provisioning Server on the SAP HANA database server, and the Data Provisioning Agent as an intermediary between SAP HANA and the external sources. Data Provisioning Agent hosts data provisioning adapters. It enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. As it can be installed on a separate host from the data source, it can be used in secure enterprise network scenarios. Replication tasks and flowgraphs let you set up batch or real-time data replication and transformation scenarios. SAP HANA Smart Data Quality also requires the script server service running on the SAP HANA database server.
SAP HANA Smart Data Integrator and SAP HANA Smart Data Quality have a number of videos created in the HANA Academy to assist with many of the common installation and configuration of the product. These videos along with a number of blogs are available to assist with getting started with SAP HANA Smart Data Integration and Smart Data Quality.
Understand Architecture and Involved HANA Objects
SDI Component hierarchy
- Each Hana instance has a Data Provisioning Server (DPServer)
- The DPServer handles the communication between HANA and the Agents
- Each DP Server has a Data Provisioning Agent connected (DPAgent)
- The DPAgent is usually close to the source system
- Each Adapter is connected to the Sources
- The Adapter acts as a bridge or translation device between the source and HANA
- Each Source has many tables (or procedures)
- The remote source object of HANA holds all information required to connect to a specific source system.
- Each source table is pointed to by virtual tables
- The virtual table represents the remote table in HANA. All metadata needed by HANA is copied for performance reasons, but not the data itself.
- Flowgraph editor for designing the dataflows with transformations.
- The Replication task editor for mass replication.
- Check the Product Availability Matrix (PAM)
- Contains the supported versions of DPAgent for the HANA versions as well as shipped adapters and their requirements.
- Check the SAP HANA Smart Data Integration and SAP HANA Smart Data Quality Installation and Configuration guide.
- Check the SAP HANA Smart Data Integration playlist in the SAP HANA Academy
- Check the SAP HANA Smart Data Integration Best Practices and Sizing Guide.
- Check the SAP Note - 2688382 - SAP HANA Smart Data Integration Memory Sizing Guideline
- When loading and transforming data, there are many factors that affect the memory consumptions that need to be considered to ensure there is enough resources at runtime.
General Steps - SDI
Regardless if SAP HANA Smart Data Integration is used in the cloud or on premise there are general steps that need to be completed to successfully enable communication between the source system and the target HANA system in order to start creating replication tasks or flowgraphs to move the data. The general steps to complete are:
- Install the DPAgent
- Starting the DPAgent
- Connect the DPAgent to HANA
- Registering the Agent
- Registering the Adapter
Setting-up the Data Provisioning Agent to Connect SAP HANA On-premise to SAP HANA Cloud - is a video that goes through each of the steps using the command line configuration tool.
General Steps - SDQ
There are additional steps required for setting up SAP HANA Smart Data Quality. These steps need to be completed in order to use the transformation nodes of Cleanse, Match, and Geocoding. The Installation and Configuration Guide contains additional information.
- Enable the scriptserver service.
- The SDQ nodes require the scriptserver service to run.
- Download and deploy the smart data quality directories.
- If the referential data is not available the process does not fail, the Cleanse node performs parsing, but doesn’t perform assignment.
- Configure the operation cache.
- The operation cache holds operation instances for Global Address Cleanse, Universal Data Cleanse, Geocode, and Type Identifier (TID), which are initialized and ready for use during task plan execution.
- Set monitoring alerts to inform you when the directories expire.
- Without monitoring, it is possible for the referential data to expire and processing to change from assignment to parsing without the users being aware.
- Best practice is always to download and install the referential data for the countries being processed as soon as it becomes available to ensure the best possible assignment.
- 2281775 - What is the release schedule for Address Directories & Reference Data?
- Create a Replication task to set up batch or real-time data replication
- Replication tasks can be configured for Initial load, Initial + Realtime, Realtime only, with and without data transfer and with and without the schema changes, depending on your business needs.
- Create a Flowgraph to transform the data.
- SDI comes with full features to transform your data in the nodes available for transformations.
- SAP HANA Smart Data Integration - Adapters
- SAP HANA Smart Data Integration - Replicating Data
- SAP HANA Smart Data Integration - Cloud
- SAP HANA Smart Data Integration - SAP Commissions
- SAP HANA Smart Data Integration - Enterprise Semantic Services
- SAP HANA Smart Data Integration and SAP HANA Smart Data Quality documentation pages.
- New features in SAP HANA Smart Data Integration and SAP HANA Smart Data Quality.
- Roadmap for SAP HANA Smart Data Integration and SAP HANA Smart Data Quality.
- The SAP HANA Academy contains a playlist of video's to help you get started with SAP HANA Smart Data Integration. There is also a playlist specific to using SDI in with SAP Cloud Platform HANA Service.
- Data Replication and Data Virtualization series which talks about the different ways of virtualizing and replicating data from and to your SAP HANA Cloud, SAP HANA database.
- Configure SAP HANA service smart data integration for SAP BTP with the SAP HANA service for SAP BTP
- 2688382 - SAP HANA Smart Data Integration Memory Sizing Guideline