Skip to end of metadata
Go to start of metadata

High CPU Utilisation

 

Whilst using SAP HANA i.e. running reports, executing queries, etc. you get an alert in HANA Studio that the system has consumed CPU resources and the system has reached full utilisation or hangs.

Before performing any tracing, please check to see if you have Transparent HugePages enabled on your system. THP should be disabled across your landscape until SAP has recommended activating them once again. Please see the relevant notes in relation to TransparentHugesPages

TRANSPARENT HUGEPAGES

SAP Note 1944799 - SAP HANA Guidelines for SLES Operating System Installation

SAP Note 1824819 - SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP2

SAP Note 2131662 - Transparent Huge Pages (THP) on SAP HANA Servers

SAP Note 1954788 - SAP HANA DB: Recommended OS settings for SLES 11 / SLES for SAP Applications 11 SP3

 

The THP activity can also be checked in the runtime dumps by searching “AnonHugePages”. Whilst also checking the THP, it is also recommended to check for:

Swaptotal = ??

Swapfree = ??

This will let you know if there is a reasonable amount of memory in the system.

Next you can Check the (GAL) Global allocation limit:  (search for IPM) and check the limit and ensure it is not lower than what the process/thread in question is trying to allocate.

 

So after you have checked to see if Transparent HugePages were disabled, and they were, the next step in analysing is to identify the action that caused this high usage of CPU’s.

Usually it is evident what caused the High CPU’s. In many events it is caused by the execution of large queries or running reports from HANA Studio on models.

In order to analyse the activities, the second step is to run a Kernel Profiler Trace along with 3-4 runtime dumps whilst the issue is occurring.

The kernel profiler is a sampling profiler built into the SAP HANA database. It can be used to analyze performance issues with systems on which third-party software cannot be installed, or parts of the database that are not accessible by the performance trace. It is inactive by default.

 

Caution

To be able to use the kernel profile, you must have the SAP_INTERNAL_HANA_SUPPORT role. This role is intended only for SAP HANA development support

The kernel profile collects, for example, information about frequent and/or expensive execution paths during query processing.

It is recommended that you start kernel profiler tracing immediately before you execute the statements you want to analyze and stop it immediately after they have finished. This avoids the unnecessary recording of irrelevant statements. It is also advisable as this kind of tracing can negatively impact performance.

When you stop tracing, the results are saved to trace files that you can access on the Diagnosis Files tab of the Administration editor.

You cannot analyze these files meaningfully in the SAP HANA studio, but instead must use a tool capable of reading the configured output format, that is KCacheGrind or DOT (default format).

(http://www.graphviz.org/Download_windows.php)

You activate and configure the kernel profile in the Administration editor on the Trace Configuration tab.

Please be aware that you will also need to have run the runtime dumps also. The Kernel Profiler Trace results reads in conjunction from the runtime dumps to pick out the relevant Stacks and Thread numbers.

To see the full information on Kernel Profiler Trace’s please see Note 1804811 or follow the steps below:

 

 

Connect to your HANA database server as user sidadm (for example via putty) and start HDBCONS by typing command "hdbcons".
To do a Kernel Profiler Trace of your query, please follow these steps:

1. "profiler clear" - Resets all information to a clear state

2. "profiler start" - Starts collecting information.

3. Execute the affected query.

4. "profiler stop" - Stops collecting information.

5. "profiler print -o /path/on/disk/cpu.dot;/path/on/disk/wait.dot" - writes the collected information into two dot files which can be sent to SAP.

 

 

Once you have this information you will see two dot files called

1: cpu.dot

2: wait.dot.

To read these .dot files you will need to download GVEdit. You can download this at the following:

http://www.graphviz.org/Download_windows.php

 

Once you open the program it will look something similar to this:


The wait.dot file can be used to analyse a situation where a process is running very slowly without any reasons In such cases, a wait graph can help to identify whether the process is waiting for an IndexHandle, I/O, Savepoint lock, etc.

So once you open the graph viz tool, please open the cpu.dot file. File > open > select the dot file > open > this will open the file:

Once you open this file you will see a screen such as

  

The graph might already be open and you might not see it because it is zoomed out very large. You need to use the scroll bar (horizontal and vertical to scroll).

From there on it will depend on what the issue is that you are processing.

Normally you will be looking for the process/step that has the highest amount on value for

E= …

Where "E" means Exclusive

There is also:

I=…

Where "I" means Inclusive

The Exclusive is of more interest because it is the exclusive value just for that particular process or step that will indicate if more memory/CPU is used in that particular step or not. In this example case we can see that __memcmp_se44_1= I =16.399% E = 16.399%. By tracing the RED colouring we can see where most of utilisation is happening and we can trace the activity, which will lead you to the stack in the runtime dump, which will also have the thread number we are looking for.

 

 

 

By viewing the CPU.dot you have now traced the RED trail to the source of the most exclusive. It is now that you open the RTE (Runtime Dump). Working from the bottom up, we can now get an idea of what the stack will look like in the RTE (Runtime Dump).

 

 

 

By comparing the RED path, you can see that the path matches exactly with this Stack from the Runtime dump. This stack also has the Thread number at the top of the stack.

So now you have found the thread number in which this query was executed with. So by searching this thread number in the runtime dump we can check for the parent of this thread & check for the child’s related to that parent. This thread number can then be linked back to the query within the runtime dumps. The exact query can now be found, giving you the information on the exact query and also the USER that executed this query.

 

For more information or queries on HANA CPU please visit Note 2100040 - FAQ: SAP HANA CPU.

BR

Michael Healy

 

  • No labels