Registration

Dear SAP Community Member,
In order to fully benefit from what the SAP Community has to offer, please register at:
http://scn.sap.com
Thank you,
The SAP Community team.
Skip to end of metadata
Go to start of metadata

Motivation

The CDC application is used to compare data sources with a complex structure or hierarchy across different systems. By doing so, you check whether the data between the source and target system is consistent, for example, whether updates in the source system have been correctly replicated to the target system. Examples of complex data sources are sales orders with several items or a master data records distributed across several tables. 

In earlier releases, when you wanted to compare hierarchical data, such as sales order headers with their sales order items, you always had to decide on the granularity and the number of comparison keys. For instance, you could do a comparison on header level only, ignoring all sales order item information. By using both the sales order header number plus the sales order item numbers as comparison keys, the CDC would create a full extract of all order items. On one hand this allows to compare all item details, but also means an extraction and comparison of a large amount of data.

With CDC in SAP Solution Manager 7.2 you can define data models using aggregates. These allow to aggregate source data (like table rows) and only compare the result of the calculated aggregate. To stay with the sales order example, you could define a data model that compares some header information plus aggregated information of the items, e.g. a count of line items in a certain status, or even a sum of the line item amounts. If the source system (or to be more precise the CDC source type used) is able to generate SQL code for this, the aggregation happens directly in the source database, and only a small amount of extracted records need to be transferred to CDC, making the comparison run considerably faster.

And on top of that you can now even use data models that define a comparison of non-aggregated data with already aggregated data. That can be useful when comparing detailed transactional data with some BW system containing aggregated data already. So in the data model you would apply some aggregation rules for the extraction from the source system, and for the target system you directly select from the already aggregated tables. The source data aggregation “on the fly” now makes the data comparable between both systems, which was not possible before, unless working with special enhanced extractor code..

Prerequisites

Aggregates in CDC data models are available with Solution Manager 7.2 SP03.

Aggregation is possible with almost all standard CDC source types, except for the IDC (Internal Database Comparison) which does not have a use-case for it. However, there is one major difference between the source types. Some of them allow a native aggregation directly in the source database by means of generating appropriate SQL code directly. This is possible for the source types ABAP and ABDY (extraction from ABAP-based source systems using a generated or generic extractor based on OpenSQL) as well as the source types ADBC and HANA (extraction directly from a source database using native SQL queries). Other source types, such as reading from files, do now allow a native aggregation (assuming that the file contains raw single item data). For these the CDC provides some generic aggregation routine which is called between the extraction and the comparison of the source data. Obviously, in this case more data is extracted than needed, and the overall performance is not as good as with a native support.

Types of Aggregates

The following aggregate functions are supported:

  • DISTINCT – Returns the values of unique records (applicable for key columns)
  • COUNT(DISTINCT) – Returns the number of unique records (applicable for one column)
  • COUNT(*) GROUP BY – Returns the number of records (in an independent result field)
  • MIN – Returns the lowest value (minimum) of a column
  • MAX – Returns the highest value (maximum) of a column
  • AVG – Returns the arithmetic mean value (average) of a column
  • SUM – Returns the sum of all values of a column

To enable the reduction of the result set (elimination of individual data rows) due to aggregation or due to the elimination of duplicate rows from a result using DISTINCT, the SQL queries need a GROUP BY clause. This clause must contain all non-aggregated columns. Typically, these are the comparison keys. With aggregation, often you use fewer comparison keys than the source table primary keys, and then the result of the aggregate gets grouped by the remaining comparison keys. So, it is not possible to have additional classic data columns or display-only fields, because their content would have an undefined value after aggregation. Data fields or display-only fields in an aggregated format are possible of course. For this, you can select an aggregate for the field in the data model on the field details popup.

With our example of sales order items, the primary key of the item table would be the header number plus the item number. For an aggregation to a header level, we would only use the header number as comparison key. All item information gets aggregated, grouped by the header key. So possible aggregates could be the count of items, the count of distinct item units, the maximum item quantity, the sum of amounts, and so on. Using the header key, it is of course still possible to join the header table. With that, you can compare individual header attributes (like the order type or the sold-to-party) and aggregated item attributes at once. As a result of such an extraction, the returned data set contains just one row per sales order.

Aggregates of type DISTINCT

This does only make sense for mapped comparison keys. When the result gets aggregated by the comparison keys, other non-key columns have an undefined state. So, it is not possible to allow further data columns or display-only fields. This use case can only be used to do an existence check based on distinct comparison keys. Note that in case there are multiple table keys as a primary key, they are all distinct by definition. So, the comparison key must be reduced to less than the available primary keys to get an effect. In that sense, the comparison key can be any column of such a table in fact. As a disadvantage, we only get the distinct values, but not how often each of it occurs.

Aggregates of type COUNT(DISTINCT)

In contrast to a standard COUNT, which returns the number of all rows in a table, a COUNT(DISTINCT) would return the number of unique values within one specified column. When applied for a full table, we surely would get a number that we could compare with the number of distinct records in the other source system, but this is a trivial use case, similar of doing total counts only. It makes more sense to group the distinct counts to at least one more (key) field, for example counting the number of male and female employees (so the grouping is done using the gender), which is covered by the third aggregate type explained below. There, additional data fields are supported, but only in form of additional aggregates, e.g. MAX(age), AVG(salary), and so on, not as concrete values.

Aggregates of type COUNT(*) GROUP BY

This use-case is a combination of the first two aggregation types. It allows to extract distinct values of a certain table column (enforced by a GROUP BY) and counts the number of occurrences for each distinct value. This is possible for key fields and non-key fields. In addition, it is possible to select further columns, but only in combination with own aggregates, because individual entries would be ambiguous.

JOINs to COUNT items, grouped by HEADER keys

A typical use case is to count entries in some item table, grouped by their header keys. Typically, in the item table, there is a header key plus the item number. The simple option is just to use the item table and aggregate the distinct header keys and count their occurrences. However, in many cases we need to filter on attributes of the header table, therefore a JOIN is needed. We can combine the JOIN and the counting in the item table. This is exactly what we need for our initial sales order example.

Aggregates of type MIN, MAX, AVG, SUM

These are the standard aggregates to find extrema or to do arithmetic calculations in table columns. They can be used stand-alone, or in conjunction with additional grouping criteria, which can either be visible or invisible.

Supported Data Types for Aggregates

The supported aggregation functions depend on the aggregate type and the used data source (i.e. a database system). Aggregates like COUNT or DISTINCT can be applied to all data types. AVG and SUM work for numeric data types only, but MIN and MAX may also work for other non-numeric data types like character and date fields.

Result of COUNTs

In case you use aggregates that return a COUNT, the corresponding value is always an integer value. Often you want to compare this count value between the sources, grouped by some other field. Of course, the grouping field keeps it original data type. But for mapping the count value, you may need an additional field in the data model. For that, CDC allows you to extend the existing data model with some artificial dummy count field, that will be filled by the count aggregates, and can be used to map and compare these values between the two sources. During the data modelling, simply do a right mouse click in the graphical editor and choose “Create COUNT GROUP BY”.

Example without Aggregates

Let us look at another example with some technical tables. We are using the ABAP Dictionary’s tables DD02L and DD03L, containing meta data definitions for Tables and Fields. This has the advantage that you can try out this example in all NetWeaver systems, including the Solution Manager itself. For simplicity, we just use a 1:1 model, that means both source systems are defined identically, so you can choose any two RFC destinations to compare the Dictionary of two NetWeaver systems.

As a data model, we use DD02L (key TABNAME) as the header table, mainly for filtering on TABCLASS=TRANSP (transparent tables only). The joined table DD03L (keys TABNAME + FIELDNAME) contains some attributes we would like to compare, including position, key flag, data type and length of each single field in each table.

This is how the classic comparison model looks like:

Note that we have two comparison keys and four data fields for detail value comparison.

Here is some sample comparison result (actually it was a comparison between a Solution Manager and an S4/HANA system, so only a small share of common tables exactly match between them):

As expected, we get a missing entry (“Object Exist Only in…”) for each combination of Table Name and Field Name. For example, in Sol Man the table A2G_ATTR000000 shows five fields that do not exist in S/4HANA.

For the “Objects Exist with Differences” we can see the concrete values in both systems. As an example, table ACMHIERARDATATAB field TAB_NAME shows a difference in two of the four compared data fields (POSITION and LENGTH).

Example with using Aggregates

Now let’s change this example to make use of aggregates.

 

Note the difference! Now we just have the Table Name as the only comparison key. Normally this would result in the infamous “Duplicate Key” exception, because table DD03L has several identical TABNAMEs in its first key. This automatically gets resolved by defining an aggregate for the second key:

 

This leads to a generated SQL query that contains a “GROUP BY DD03L~TABNAME”.

In addition, for each data field we want to do a detail comparison, we can no longer extract the concrete single values, because they would get ambiguous in relation to their reduced key. Instead we define aggregates for them as well, for example POSITION and LENG get a MAX aggregation:

Now let’s have a look at the result:

For the missing entries, we now see a list consisting of one comparison key only. For example, the Sol Man table A2G_ATTR000000 appears only once, because the existence check was done on header level only.

For the differences, we now see the result of the aggregates on the three mapped data columns, which are MAX(POSITION), MAX(LENG) and COUNT DISTINCT FIELDNAME.

For each table with a difference, we get a maximum of three detail rows here, no matter how many individual fields of this table have a difference.

 

 

  • No labels