Effect of garbage collection on throughput of PI server
Garbage as the name suggest is a collection of useless objects which were once useful to the program. System does memory recycling by releasing the heap space occupied by objects no longer being referenced by the program. The free space may be utilized by new objects. Garbage collector must run any finalizers of the object being freed.
SAP NetWeaver application server uses Virtual Machine Container Technology , the aim is to provide isolation between various users as securely as possible to avoid any unwanted restrictions. Shared memory is exclusively used in the VM container environment in order to convert the SAP roll-in and roll-out concept. Each work process can map areas of this shared resource to its address space (stable address). Only in exceptional cases would local process memory be allocated for a short time. While a request is being processed a Java VM works exclusively for one user. A pool of VMs is dynamically assigned to the work processes. These VMs can access shared data in the shared memory.
There are the following memory types for the VM container:
● Java heap: Java objects belonging to a VM are stored on the Java heap. Each VM has its own Java heap, and other VMs cannot access them.
● VM heap: The VM heap is used to store local objects such as VM Java and service stacks. It can be used only by one VM.
● Shared pool: This can be used by all VMs and contains the shared Java objects (for example, shared closures, shared classes, shared code cache for compiled code).
A garbage collection is run periodically on the local VM Java heap and in the shared pool. As data content of heaps increases the garbage collection process takes more time to free the memory thus system throughput decreases.
Unix shell scripts to spilt large files before PI starts processing it
Unix shell scripts may be written to split the files before PI picks it up for processing. In order to work with Unix shell scripts we need to know basics of Unix operating system and some of its commands. Detailed information on Unix commands may be obtained from the links given below
The following books I referred, to learn about UNIX operating system
- Your UNIX: The Ultimate Guide by Sumitabha Das
- Unix Shell programming by Yashavant Kanetkar
While trying to split files before PI server picks up the file for processing there are certain factors to be considered as shown below
- The original large sized file cannot be directly split it into smaller parts on basis of its size. The reason being that each smaller part of the file has to contain certain number of complete lines. If division is made only on basis of bytes it contains then smaller files might have incomplete lines in it. Each line of a text file might contain multiple data separated by comma or other characters. The number of data items and their format must remain intact in the smaller files obtained after splitting the large file.
- The lines of the file to be split may not be related to each other. Then the procedure of splitting becomes simple. Thus any number of lines is fine in one file. But what do I mean by relation within lines of a file? Sometimes two consecutive lines within files may be dependent on each other. That is either both should be present in one file. We have to take care that these lines do not get separated after splitting. The case study I have discussed later deals with such scenario.
- The case study describes how to split text files but it cannot split files of other categories such as PDF etc.
- After designing a shell script the SAP BASIS team can put the script in proper directory of PI server, since PI consultants may not have access to operating system.
- The file will be split into equal number of lines as far as possible. It is possible that the last of the newly obtained smaller files obtained after splitting a large file may have very few lines. Roughly you can expect nearly 17000 lines in a file of size near 5MB.
- In real business scenarios the files might contain a header and trailer information. These information are important since they serve as security parameters. Trailer for example might contain number of lines in the original file, so if by mistake new lines have been introduced or lines have been deleted the trailer information might not match with actual number of lines in the file. This mismatch is reported by validation check within PI server. If a file has got header and trailer information then each of the spitted files should also contain the same format. The case study included below shows the process of splitting a file which has header information in it.
- There are two ways using which the shell script might be called to split a large file. Firstly the script might run in an infinite loop within server. In this case we need to ensure that the directory in which we are putting the file is not the directory where PI will pick it up for processing else PI server might pick up the large file for processing. The script will first check the file before processing whether the file is eligible for splitting or not. If the file size is more than 5MB then the script will produce smaller files and put those smaller files in the directory from which PI server picks those up for processing. In second method we might call the script as an operating system command (within command line) within communication channel parameters as shown below. In this case the file may be directly put in the folder where PI server is going to pick up the file. In the case study I have followed the first method.
- PI server will process each line of the file one after another sequentially. When PI server picks up the smaller files after splitting it might not follow any sequence. The server may process any of the newly spitted files in any order, this is because all the files are available almost at the same time. Generally in real business scenarios the order of processing of each line within a file does not affect the results. In case the order of processing is going to affect your results, then you need to pause the script for some time after it produces each file. There is Unix command “sleep” for your help. I have not used “sleep” command in case study since it increases the time of processing of the script.
- Shell scripts have many advantages over conventional programming such as easy program or file selection, quick start, and interactive debugging. A shell script can be used to provide a sequencing and decision-making linkage around existing programs, and for moderately-sized scripts the absence of a compilation step is an advantage. Interpretive running makes it easy to write debugging code into a script and re-run it to detect and fix bugs. [stage:3]
- The filenames of the newly obtained smaller files after splitting the large file should indicate the line numbers contained within it.
Problem: Splitting a text file with file extension “txt”. The file may have any number of lines in it. We have to split it into multiple smaller files. Each line of the file ends with a word “mile” or “km”. Splitting of files at any point within the file is possible, except that we cannot split at a point where there are consecutive lines with last value “mile” followed by line which ends with “km” or vice-versa. So always the script has to check last values of consecutive lines, if both of last values are same then only splitting is possible between those two lines else the scripts searches for correct place to split the file. ( This program should also work fine with excel files and .csv files, but I have not tried it myself.)
Each step of the script is explained below. Explanation statements or comments are preceded ‘#’ symbol.
Here is the original file to be split “sdn.txt”
This file sdn.txt has 13 lines in it excluding header. I have put an criteria in the script shown above that the file be split after every 3 lines. We have to keep this file sdn.txt in the same folder as that of the script. The scrip runs in infinite loop so whenever you put any “txt” file in the folder it will be split automatically. The new files if any will be put under “met” directory under current folder. [stage:In case you want to call the script through command line feature in communication channel settings then you cannot have a script running in infinite loop. So please make necessary changes in code.] The receiver communication channel must be configured to pick up files from "met" directory.
The new files formed after split is shown below
If you notice carefully you will see that sdn1_line_numbers_1_5.txt has 4 lines instead of 3 (excluding header). This is because between 3rd and 4th line of the file we cannot split as per our initial condition that if consecutive lines are “km” followed by “mile” or vice versa, that cannot be a split point. Thus the script has searched further and found suitable split point next line. If the lines are independent of each other you can reduce this checking. The original script file (without comments) and sample "txt" files are provided in links below.