Initial Extraction is typically done once for every template defined and it uploads all data (returned by the query).The Initial Extraction mainly refers to the Initial Extraction of Business Objects. The Initial Extraction of Technical objects consists of a considerably trivial implementation, it is basically a SQL read on the underlying table based on the information retrieved from modeling. There is no framework available for the extract of authorization data, this has to be done by the respective application team.
The main processes may be described in words briefly, as a precondition the Load requires information
- Request keys. This can be done by query, we store the complete result in a database table and fetch every package (usually 10.000 items) from this table. This approach is only advisable when the amount of data definitely remains below 500.000 entities. Otherwise the data has to be fetched by a query to be implemented from application side.
- Assemble query and retrieve data using the GenIL
- Examine returned data and write into exporting parameter.
The second and the third steps are parallelized in case the parallel mode is used.
See below the architecture diagram for the Initial Extraction. The colors of the swim lanes indicate under which responsibility the corresponding object falls. The "CRM Enterprise Search" objects are displayed in red (the modeling part a bit darker). Light blue entities fall under the responsibility of the Enterprise Search team.
The keys will be retrieved using the method approach if the same is implemented; otherwise the keys are retrieved using the maintained query.
The model has to provide a query name (including required parameter) which is used to fetch the complete data. It has to be possible to determine the key (and key structure) out of the returned data. The result will be stored in a intermediate table, from here there data is returned in chunks of the package size (as defined in the Enterprise Search, usually 10,000) sorted by the key.
In case data size exceeds certain limits (~400.00 entities) it is to recommended to implement a method returning the data in chunks. The data has to be returned in a sorted way, but not necessarily sorted by the provided key. As example the Account template uses the GUID as key but returns the data sorted by BP NUMBER. This means that the application development is fully responsible that the mapping and sorting is done correctly.
It is allowed that the method returns more data sets as requested. It is not allowed to return less data, in case there are still data sets left to be read.
The signature of the method GET_OBJECT_KEYS following the interface IF_GENIL_APPL_INTLAY, which has to be implemented, looks as follows:
The Initial Load work can be run using several processes. We distinguish between three modes: 'P' fully parallelized; 'H' (hybrid) Initial Load parallelized, Delta Load sequential; 'S' sequential.
Some parts of the process as the key retrieval can not run in parallel. The same is valid for some work on the result due to technical restrictions. Since the main time is required for reading of the data itself, this should be no problem.
For the number of tasks to be spawned (see customizing) we can hardly give hints, since this is highly dependent to the machine landscape and the system load required by other processes. Please note that the Initial Load of different objects may also run parallel, this is an independent feature. As for the parallel package size (which has to be distinguished from the ES package size which is usually 10,000) we made good experiences using numbers between 50 and 100.
Please note that the parallel mode is capable of sorting out corrupted objects, which are causing system dumps. If the number of dumps is not exceeding certain limits, dumping objects are encircled, identified and excluded. If a dump is occurring using sequential mode, the process will be halted.
For data retrieval, a simple genil read operation is fired. The complete data is read in one step, so we hand over the list containing the keys of all root objects as well as the list of the required relations.
The data is examined recursively. In the sequential approach the tables are filled directly on this operation. With the parallel approach we can not follow this approach since it is not possible to forward the node tables (which are dynamically constructed) to the parallel processes. Instead we fill tables with the original BOL definition and later on fill the node tables after parallelization.