Data integration (sometimes called Extract Transform and Load or ETL) is concerned with the problem of bringing in data from a wide variety of sources and normalizing it. Data quality is all about ensuring that the data is clean, does not have missing or duplicate values, and is reliable.
Because computer systems and software are always changing, as well as the system landscape in a company affected by mergers and acquisitions, the issues of data integration and data quality present ongoing challenges for most companies.
In order to make their solutions accessible to data integration tools, consider integrating via web services. You can call a web service to send data to tools and then make a second call to retrieve the data. You can even use an application as the source for web services. For more information on these web services, see "SAP BusinessObjects Data Services Integrator's Guide" .
A concrete example might be a case where the ISV application uses SAP BusinessObjects Data Services to extract data for master data management and persist it back into the application. SAP BusinessObjects Data Services features could be used to normalize address data, eliminate duplicates, clean up the data using predefined rules, and send it back to the ISV application. Such proactive use of data quality and data integration techniques helps ensure that new data persisted in the application is clean before it is stored.
Data profiling is another common task within data integration. Data profiling means handling the inconsistencies seen in data. When data comes in through a single interface, the program can validate data as it is entered. But in many other cases, data comes in a much less consistent way, with improperly formatted social security numbers, invalid values for certain fields, missing data, and the like.
Perhaps one of the most well known uses for data integration tools is to integrate data from multiple sources into a data warehouse (including structured and unstructured data).
Technology comparisons
SAP recommends using the Data Quality features of SAP BusinessObjects Data Services to support customers in improving the quality of existing and incoming data.
You can move data between application systems in a variety of ways. Remote function calls (RFCs) are a fast but SAP proprietary method for moving data, but they are not applicable to third-party systems. Enterprise Service Bus (ESB) functionality can be used. (Table 6-2 compares SAP BusinessObjects Data Services with ESBs.) Web services are good for moving small amounts of data and provide flexibility, standardization, and agility since they follow a service-oriented architecture (SOA) approach. SAP BusinessObjects Data Services can be utilized for high speed transfer of data between systems in scenarios where:
- Data must be moved multiple times per day
- Complex transformations are required
- Data has to be checked and possibly corrected before being integrated into the target system
Extract, Transform, and Load versus Enterprise Service Bus
|
Extract, Transform, and Load (ETL) |
Enterprise Service Bus (ESB) |
---|---|---|
Core Use Case |
Data replication and data synchronization in real time |
Event-driven message-based integration in synchronous and asynchronous scenarios, guaranteed delivery |
Data Characteristics |
Bulk data |
Single messages with small, medium, and large payloads |
Transformation Characteristics |
Lightweight to sophisticated transformations |
Lightweight to sophisticated transformations |
Integration Process Characteristics |
Single-step stateless integration processes only |
Multistep stateful integration processes requiring workflow/business process management (BPM) engine to handle process status |
SAP product |
AP BusinessObjects Data Services |
SAP NetWeaver Process Integration |