We find that a large extent of the PI scenarios deal with files(such as file to file, file to RFC, file to Idoc etc). PI has to read the file transform into XML message after which generally validation (in form of java mapping in PI 7.0) follows. It is not hard for PI consultants to observe that there is a relation between file size posted in server and performance of the server. This article is an attempt to analyse the relation between file size and performance of PI server and to find out possible optimum file size to improve server throughput.
Memory Management by Operating system revisited
A process is defined as a program in execution. A process has text section(program code), stack(temporary data is stored here),data section(contain global variables) and heap( this is for dynamic memory allocation during run time)
A process is kept in main memory in form of pages(same size blocks of secondary memory). The pages are first kept in auxiliary memory. The pages are brought into RAM as frames while a process executes. The main functions of paging are performed when a program tries to access pages that are not currently mapped to physical memory (RAM). This situation is known as a page fault. The operating system must then take control and handle the page fault, in a manner invisible to the program. Therefore, the operating system must:
- Determine the location of the data in auxiliary storage.
- Obtain an empty page frame in RAM to use as a container for the data.
- Load the requested data into the available page frame.
- Update the page table to show the new data.
- Return control to the program, transparently retrying the instruction that caused the page fault.
Because RAM is faster than auxiliary storage, paging is avoided until there is not enough RAM to store all the data needed. When this occurs, a page in RAM is moved to auxiliary storage, freeing up space in RAM for use. Thereafter, whenever the page in secondary storage is needed, a page in RAM is saved to auxiliary storage so that the requested page can then be loaded into the space left behind by the old page. Efficient paging systems must determine the page to swap by choosing one that is least likely to be needed within a short time. There are various page replacement algorithms that try to do this. Most operating systems use some approximation of the least recently used (LRU) page replacement algorithm (the LRU itself cannot be implemented on the current hardware) or working set based algorithm. If a page in RAM is modified (i.e. if the page becomes dirty) and then chosen to be swapped, it must either be written to auxiliary storage, or simply discarded.
During execution most programs reach a steady state in their demand for memory locality both in terms of instructions fetched and data being accessed. This steady state is usually much less than the total memory required by the program. This steady state is sometimes referred to as the working set: the set of memory pages that are most frequently accessed.
Virtual memory systems work most efficiently when the ratio of the working set to the total number of pages that can be stored in RAM is low enough that the time spent resolving page faults is not a dominant factor in the workload's performance. A program that works with huge data structures will sometimes require a working set that is too large to be efficiently managed by the page system resulting in constant page faults that drastically slow down the system. This condition is referred to as thrashing: pages are swapped out and then accessed causing frequent faults.
An interesting characteristic of thrashing is that as the working set grows, there is very little increase in the number of faults until the critical point (when faults go up dramatically and majority of system's processing power is spent on handling them). During thrashing operating system spends very little time doing useful work rather it spends time in swapping pages in and out of RAM. Thus CPU utilization decreases. When there is decrease in CPU utilization CPU scheduler tries to bring in new process waiting to execute into main memory so that CPU utilization improves. As each new process page is introduced into main memory , page fault rate increases further.
Let P be the fraction of time that a process spends away from the CPU. If there is one process in memory, the CPU utilization is (1-P). If there are N processes in memory, the probability of N processes waiting for an Input/output is P*P...*P (N times). The CPU utilization is ( 1 - P^N ) where N is called the multiprogramming level (MPL) or the degree of multiprogramming. As N increases, the CPU utilization increases. While this equation indicates that a CPU continues to work more efficiently as more and more processes are added, logically, this cannot be true. Once the system passes the point of optimal CPU utilization, it thrashes as already explained above.
Scenarios involving file in PI 7.0
Though this article addresses role of file size in PI performance issues in PI 7.0 (since I am mainly working in PI 7.0) it is generally applicable for all versions of PI. When a file is posted in PI server first it is converted into xml message before message mapping takes over. While conversion to XML the file size grows. Sometimes file size can grow to more than six times its original size. Thus process heap size increases. As process heap size increases the requirement for more pages in the main memory increases. Thus server performance goes down for reasons mentioned above in the article. In PI 7.0 we need to validate the XML message using java validation code. For validation we generally use DOM parser. In case of validation DOM parsers are preferred over SAX. This is because for validation we need the entire XML DOM tree within memory to be present at one time. By its nature DOM parser constructs a xml tree. Since during java mapping we need a source and target message so we need to have two DOM tree at a time most often increasing the memory need. The entire DOM tree has to be in RAM during java mapping.
If we want optimum performance from PI then it is best to have file size posted between 2-5 MB. After 5MB the PI performance decreases. We can use batch processes or scripts to split the files before posting the files in server for files of large size. If we are using file content conversion then we must set the option number of records per message to a specific value to prevent message size growing too high. A foreground/dialog process running in PI has default maximum runtime of 300 seconds. If a process runs for more than this default time due to its memory requirement the system will terminate the process. In order to increase the time limit we need to set the value for the system profile parameter rdisp/max_wprun_time to a high value using transaction RZ11.
We can set a limit on the request body message length that can be accepted by the HTTP Provider Service on the Java dispatcher. The system controls this limit by inspecting the Content-Length header of the request or monitoring the chunked request body (in case chunked encoding is applied to the message). If the value of the Content-Length header exceeds the maximum request body length, then the HTTP Provider Service will reject the request with a 413 “Request Entity Too Large” error response. You can limit the length of the request body using the tting MaxRequestContentLength property of the HTTP Provider Service running on the Java dispatcher. By default, the maximum permitted value is 131072 KB (or 128MB).You can configure the MaxRequestContentLength property using the Visual Administrator tool. Proceed as follows:
1. Go to the Properties tab of the HTTP Provider Service running on the dispatcher.
2. Choose MaxRequestContentLength property and enter a value in the Value field. The length is specified in KB.
3. Choose Update to add it to the list of properties.
4. To apply these changes, choose (Save Properties).
The value of the parameter MaxRequestContentLength has to be set to a high value.
In short parameters to reset values for ABAP side are
- icm/server_port_ TIMEOUT
Parameter to reset values for JAVA side is MaxRequestContentLength.
Best method is to split the file into smaller parts before processing .
- Operating System Concepts by Silberscatz Galvin. http://as.wiley.com/WileyCDA/WileyTitle/productCd-EHEP000794.html