The Extractor Framework is the central infrastructure of SAP Solution Manager for data collection and distribution. It is used to collect data for Root Cause Analysis, E2E Monitoring and Alerting, and Reporting, just to name a few. By now, pretty much all scenarios in Solution Manager use extractors from the extractor framework.
The picture below shows the high level architecture of the EFWK.
On the left side you find the different data sources for the extractors. Extractors don’t only run in Solution Manager, but also in the managed systems. Usually extractors are functions in function modules (mainly shipped with ST-PI and ST-A/PI for ABAP) or part of the aglets of the diagnostics agents. There are also some extractors running locally in Solution Manager using local data sources like DBA Cockpit.
Depending on the type of data source, there is an appropriate interface on the Solution Manager side of the information flow. Data arriving in the extractor framework can be enriched with further information e.g. landscape information stored in Solution Manager.
After the data enrichment the data is stored in the Solution Manager BW. Not all data is necessarily stored in the BW. Some data will only be stored in DB tables. This is the case if the data is not reporting relevant, e.g. values for some metrics in the E2E Monitoring and Alerting that are not marked as reporting relevant.
The data in the BW and the DB tables is then accessed by the Solution Manager applications.
Data Flow & Data Collection
In this section we describe the data flow and the data collection in the extractor framework.
The following terms will be used repeatedly in the following, so we want to explain them here, for better understanding later on.
Work List Item:
- A work list item defines which metric has to be retrieved by which extractor, including all required configuration parameters (Main Extractor, Extractor, PPMS Modeling, RFC destination to call the extractor, frequency etc.)
- All work list items can be found in table E2E_ACTIVE_WLI, the flag active is set if the extractor is actually running
- The smallest technical entity collecting the data, the actual data collector
- Retrieves data either directly from managed system (e.g. via RFC call) or indirectly from an intermediate layer (e.g. Wily Introscope Enterprise Manager)
- The entity in Solution Manager responsible for execution of the extractor
- It calls the extractor locally or in managed system
- Communication with EFWK API (e.g. resource release, status update, …)
- Responsible for managing the execution of all extractors in pull mode
- Handling of Resource Management
- Starts main extractors asynchronously
The following picture shows the data collection infrastructure.
The picture above shows the data and the request flow. Red dots mark push interfaces, where the data is pushed autonomously from the source to the target of the connection. Blue dots are pull interfaces, where the request for data is triggered from the target of the data flow. Yellow dots mark interfaces, where a request for configuration information can take place. e.g. between the Data Provider Connector and the MAI Config. Repository to verify which of the received metrics are valid. Green dots are triggers from the EFWK Resource Manager when starting the Main Extractors or the Data Provider Connector for MAI.
If an extractor is a PUSH extractor, this means the data provider in which the extractor is running triggers the data extraction based on its configuration autonomous and sends the data to the EFWK via the Data Provide Connector web service API. PUSH extractors are the extractors for MAI located in the diagnostics agents or in the DPC extension for Wily Introscope EM for MAI. These MAI data sources collect and send data based on the monitoring configuration, which contains information on which data to collect and with which frequency. This configuration is stored centrally in the MAI Config Repository, but is replicated to the remote data source/collector. So the diagnostics agent or Wily Introscope doesn’t have to request this configuration every time, but only if it deems necessary, e.g. in case an agent loses connectivity for some time and reconnects. Usually this configuration is pushed to the diagnostics agent and to Wily EM during the System Monitoring setup. The Data Provider connector uses this configuration information to decide which of the metrics, it receives via Agent or Introscope PUSH, are valid and processed further.
PULL extractors are extractors that have to be triggered to actually deliver data. They are triggered by the EFWK resource manager, based on the configuration stored in the EFWK configuration.
The job EFWK RESOURCE MANAGER calls the Resource Manager every minute. Depending on the work list items, that are due as of the table E2E_ACTIVE_WLI, the main extractors are started asynchronously. As the main extractor is called asynchronously by the Resource Manager, a new Resource Manager run could start before the main extractor called in the previous run is finished.
The main extractor then calls the actual extractor in the managed system or locally in Solution Manager, based on the target RFC destination maintained in table E2E_ACTIVE_WLI. If the extractor runs locally the RFC destination would be NONE. The extractor is usually a function module and runs the extractor class to collect the data. When the main extractor receives the data from the extractor, it calls the automatic data enrichment.
When the data has been enriched, the main extractor calls a data loader, which is a function module in the BW system of Solution Manager which writes the data into data targets such as BW cubes or DB tables.
When the data loader has finished, the main extractor can call the post processor, e.g. to clean up the data source system by deleting temporary extractor data. When the post processing has finished, the main extractor updates the status record for the extractor and ends.
Remark: All data transported in the EFWK is in UTC. The same applies to logging and the required record algorithm. Only when the data is written to the data storage it is converted to the time zone of Solution Manager.
To make sure the EFWK doesn’t overload the managed system or Solution Manager a resource management was put into place. This way the RFC destinations and workprocesses used by the EFWK are restricted.
The Resource Manager starts the main extractor via a local RFC call (RFC destination NONE). This local RFC call blocks one dialog workprocess. In the main extractor the suitable call to the managed system is made, if the resource cap for this RFC destination is not used yet.
While the extractor in the managed system collects the data, the local Dialog RFC might be rolled out, to free the local resource for other extractors. When the extractor delivers the required data the DIA RFC is rolled in again the data is processes by the extractor framework.
We differentiate between local and remote resources.
- Local Resource: Number of Dialog Work Processes to be used by EFWK
- Remote Resource: RFCs per system or per Introscope to be used
The resource cap defines the maximal number of extractors, which can run in parallel and using the same RFC Destination.
Resource Manager Injection Cycle
All work list items are stored on a global work list in table E2E_ACTIVE_WLI. In each run of the Resource Manager the due work list items are added to the current work list.
From the current work list the work list items are injected, if the resource cap allows it. If a work list item couldn't be injected it stays on the current work list. The resource pool injection runs thru several injection passes. Between each injection pass is a wait state, this should allow resources to become available again. The resource pool injection will try to inject a work list item for a defined amount of injection passes. The amount of injection passes is configurable.
If a work list item couldn’t be injected it is sent back to the global work list and the priority for this work list item is raised by 1.
Extractors running in LUW Mode
Extractors in LUW mode are complex extractors that optimize the data collection and the data distribution in Solution Manager.
The LUW ME Controller is the extractor that is triggered by the EFWK Resource Manager on a regular basis. The ME Controller then starts the Primary extractor, which usually also performs the data collection in the managed system based on defined filter criteria. The Primary extractor stores the data in memory and returns to the ME Controller, which then distributes the collected data sequentially to several Secondary extractors. These secondary extractors then store the data in different info cubes. It can happen that the same data record is needed in different info cubes.
The LUW concept makes sure that data is only collected once and that not several different extractors are needed for the data collection. This improves the infrastructure and also the load put on the managed system.
Record Tracking Algorithm
The decision which records are requested by the EFWK from the extractors is made based on the Record Tracking Algorithm. The EFWK requests records on an hourly basis and always requests the records from the last full hour.
The required records are calculated in the main extractor, they can be found in the table E2E_RECORDS_WLI. The relevant Main Extractors are E2E_ME_RFC_ASYNC and E2E_ME_ST_API_ASYNC.
Remark: The required records algorithm calculated the required records in UTC time zone.