Skip to end of metadata
Go to start of metadata
Summary

A nested structure is actually nothing else than a column of datatype table. And hence such column could get quite large.

Imagine a Query with columns like this

 

How much memory does it require? Assuming that no operation requires to cache all rows, SAP Data Services would receive such a row, process it and send it downstream to the next transform. So DS  will require up to 1051 bytes memory to hold that row given the maximum length of each datatype (the 1051 bytes is not the exact number, there are some more rules to consider)
If we change the datatype of a column and fill it up with data, obviously the memory requirement is bigger. If we change the ITEMS_CHAR to varchar(10000) and it contains such a long text, the row requires 11041 bytes of memory. If the datatype is LONG and its containing text is 10000 bytes long, again we will need that much memory. And of we change the datatype to SCHEMA and it contains data with 10000 bytes, the memory requirement will be equal.
The problem however is the user expectation. If you read thousands of images files each 100MB in size from source through Data Integrator into the target, everybody understands that this will take a while. But for a deeply nested structure it is not that obvious.

A simple test proofs above. We create a dataflow with two Row_Generation transforms as source for each schema.

 

The Row_Gen_Header is in the from clause for the Query level - it controls how many rows go through the query. And the Row_Gen_Detail controls how many rows go into the schema named ITEMS - how "long" the data of this column is.

 

In the first run we have 100'000 Header rows, and the ITEMS are just 10 rows long. So the memory requirement is the 1041 bytes (HEADER_ID, SHIP_TO,...) + 10*41 bytes (LINE_ID, QTY,...) so about 1500 bytes. Executing that shows the following CPU and memory utilization:

 

The XML File written by this dataflow is 185MB big.

Now we change the characteristics. Before we had many rows but each row had just a small ITEM table, no we make the Row_Gen_Header 10 rows big and the Row_Gen_Detail should be 100'000 rows.

 

It requires more memory although the resulting XML is actually slightly smaller, just 164MB.
The memory consumption might not look like much but imagine you have a query with a sub schema with 1000 rows which itself has a schema with 1000 rows that has 1000 rows.... you get the point.

 

 

  • No labels