Page tree
Skip to end of metadata
Go to start of metadata

Purpose

The purpose of this Wiki page is to provide general information on the Snapshot MONitoring utility in report /SDF/SMON (hence referred to as SMON).

General

SMON is a monitoring utility that periodically collects data related to the system's health.

SMON is the successor to /SDF/MON and collects additional data and contains features which /SDF/MON does not.

It is recommended to use SMON instead of /SDF/MON.

SMON works by starting long-running collectors, one per application server. These collectors sleep (WAIT UP TO ...) between snapshots and are started as RFC calls in an "async w/o result" connection.

Collected snapshots are stored in Snapshot Collections.

Customers sometimes express concern over these long-running collector tasks thinking their system has a performance problem, but the behavior is expected and not an indication of a performance problem; see SAP Note 2229255 - Long RFC response times when you use /SDF/MON.

Installing SMON

SMON is first made available in SAP NETWEAVER 7.4

The system must have Software Component ST-PI Release 740 with SP-Level 0002 or higher.



Refer to SAP Note 539977 for steps on installing & updating ST-PI.

In lower SAP NETWEAVER releases where SMON is not available, /SDF/MON can be used as an alternative ( SAP Note 2383809 )

How to Schedule SMON

See SAP Note 2651881 - How to configure SMON for performance monitoring and analysis.

Navigating SMON

To access SMON, execute transaction /n/SDF/SMON (a leading "/n" is required due to the namespace) or run report /SDF/SMON in transaction SE38.

If no snapshot collections exist yet in the system, the home screen of SMON will be the Start Snapshot Monitoring screen:

 

If collections do exist in the system, the home screen with be the Snapshot Monitor screen, listing these collections:

 

Double clicking a snapshot collection (or selecting a collection and clicking the Display Overview Data button) will display the Filter dialog screen:

This allows initial filtering to be done prior to viewing the snapshots and is useful if you know the timeframe the problem occurred or if the issue was isolated to an application server/user/WP type/etc.

Because the default interval of SMON is 1 second (and since most content is only collected every 10th or 60th interval) the default restriction is to display every 30th snapshot.

If you would like to see every snapshot change the Display only every nth snapshot value to 1. 

There is an issue when using the Display only every nth snapshot restriction.

When trying to display the additional content (e.g. Memory per mode) in the Monitoring Data overview screen, SMON will sometimes raise a "No detailed information available" error message.

This can occur when content which is collected every n snapshots, and the snapshots being displayed via the display only every option are not the same.

For example, if we are only viewing snapshots ending in :13 (hh:mm:13; display only every set to 60) and the Top CPU Processes (collected every 60th snapshot) were collected on snapshots ending in :00 (hh:mm:00) then SMON may raise the error previously mentioned when trying to display that content.

 

Clicking Execute on the snapshot filter dialog screen will take you to the overview data for the snapshot collection. Each row in the Monitoring Data table corresponds to a single snapshot.

For content which is collected Every n times (e.g. CPU and paging activity) the information is repeated in the rows following the snapshot in which it was collected until the next nth snapshot.

 

Timeframe, Interval, & Server

When configuring snapshot monitoring, the timeframe, interval, and servers options defines the scope and collection rate of the content.

Do NOT configure a timeframe greater than 24 hours.

SMON is not designed to gracefully handle multiple days of data in a single snapshot collection.

If more than 24 hours of monitoring is needed, use Daily Monitoring instead.

If for some reason there is a snapshot collection which covers a timeframe greater than 24 hours, see deleting large snapshot collections for steps to remove them safely.

Timeframe

Determines when the SMON collectors should start and end. The option [Do not delete before] determines the expiration date of the snapshot collection. When this date is reached the collection is automatically deleted.

When configuring Daily Monitoring, only the start and end time may be defined and the expiration date is defined by the age of the collection rather than a specific date:

Servers

Defines the application servers which the collectors will be started on. If left blank, collectors will be started on all application servers.

Interval and Every nth time

[Interval in seconds] defines the base interval for how often the collectors roll-in to generate a snapshot.

[Every n times] defines how often that content will be collected as a multiple of [Interval in seconds].

In the screenshot above, "List of workprocesses (SM50)" is collected every 5 seconds, but "Dispatcher queue" data are only collected once every 30 seconds.

    • [List of workprocesses (SM50) interval] = [Interval in seconds] * [List of workprocesses (SM50) Every n times] = 5 seconds * 1 = 5 seconds
    • [Dispatcher queue interval] = [Interval in seconds] * [Dispatcher queue Every n times] = 5 seconds * 6 = 30 seconds

As [Interval in seconds] is increased, [Every n times] should be reduced by a similar factor to avoid large gaps between snapshots.


Content Collected by SMON

Whether configuring one-time monitoring or Daily Monitoring, the content options are the same:

 

When browsing the collected data, the content switches above correspond to the columns below:

 

Some information must be accessed through the Goto menu or by double-clicking particular columns.

 

1. Time since last measurement in ms

The time elapsed since the last snapshot on that application server.

The color of the snapshot will be highlighted:

    • Yellow when: [time since last measurement] - [Interval in seconds] > 1 second

    • Red when: [time since last measurement] - [Interval in seconds] > 5 seconds

"Time since last measurement" is useful for finding periods where request processing may have been disrupted.

Every "Interval in seconds", SMON collectors need to roll in to collect content and generate a snapshot.

If something* delays the roll-in or the collector itself, "time since last measurement" will be significantly greater than "interval in seconds".

(*Many general performance problems can cause delays in the collectors. This something could be: no free WP is available, CPU bottlenecks, memory thrashing, slow/hanged DB commits, and much more.)

To find snapshots that have been delayed, use the "Min. Delay in seconds" field on the Filter screen:


2. List of workprocesses (SM50)

The Work Process activity on each application server; similar to the information found in SM50 and with the dpmon tool.

In addition to collecting the Work Process activity, SMON contains tools for analyzing the collected WP data:

    • Work Process List - Displays all WP samples in the selected snapshots/rows.
    • Filtered WP List - As above, will display WP samples from all selected snapshots with the option of including filter criteria
    • WP List Aggregation - Allows you to aggregate WP samples by common criteria. Optionally allows you to apply filter criteria at the same time.

3. Create Callstack

When enabled, the collectors will attempt to obtain the ABAP callstacks of running WPs.

A callstack is not always obtained depending on the current activity & state of the WP.

When a callstack for a WP sample is available, there is a field in the WP list called Stack Info Available that will contain a call stack icon:

Double clicking the call stack icon will display the call stack:

4. Dispatcher Queue

The queue length information for DIA, UPD, and ENQ task types on each application server.

Similar to the information found in SM51 → Goto → Information → Queue Information.

Helpful for finding periods with high workload/wait times.

5. Extended and Heap Memory

The roll+heap memory use from NetWeaver's perspective.

6. Number of logins/sessions

Useful for estimating workload.

7. Memory per modes

Similar to the information found in SM04. Information about each logon and the memory used by each user mode/session.

8. CPU and paging activity

Same information found in ST06 (OS monitor). ST06 only provides current and hourly values, SMON collects this information in more granular time intervals (down to 60 seconds).

Useful for ensuring the SAP system is provided enough memory and is receiving enough resources from the hypervisor.

9. Top CPU Processes

Same information as found in ST06 → Snapshot → Top 40 CPU processes.

Useful for checking if non-SAP processes are competing with the NetWeaver system for CPU resources.

10. Enqueue entries ‡

Same information as in SM12 → List.

11. Enqueue statistics

Same information as in SM12 → Extras → statistics.

12. Inbound queues ‡

The same information which can be browsed in SMQ2 (Inbound Queue monitor).

This information is not very useful as it is not easy nor possible in some cases to link a blocked/long-running queue to a particular action/WP in the snapshots.

The same problem exists when investigating a qRFC problem live in the system, as such, root cause analysis of historical qRFC issues (which are not attributed to general performance) is not possible in some cases.

13. Outbound queues ‡

The same information which can be browsed in SMQ1 (Outbound Queue monitor).

As mentioned above, this information is not very useful and historical qRFC investigation is limited.

 

‡ : May cause a high workload; see Overhead and Space Requirements

Differences Between SMON and /SDF/MON

Below lists some of the key differences between SMON and /SDF/MON and reasons why SMON should be chosen over /SDF/MON.

CPU and paging activity

SMON includes much more information from the OS collector (ST06 data) and in some cases critical information for understanding the resource consumption (including KPIs of virtualized servers) during that time frame which is not recorded anywhere else in that granularity.

SMON:

 

/SDF/MON:

 

It's easy to see how much more valuable SMON is by simply including virtualization KPIs such as CPUs consumed, CPU Ready, and Steal time. Note how misleading the Free Memory column of /SDF/MON is when the file system cache is not considered!

Inclusion of Logon handles and Session IDs

SMON includes the back-end session key used to uniquely identify each session (user mode) in the system, /SDFMON does not. This allows us to profile and track dialog steps across roll-outs and roll-ins when the user session does not roll back into the same WP (i.e. the WP number changes).

In the screenshot below is the WP list of a snapshot and another screen of SM04 → User → Technical Information.

The back-end session key is included in the form of the Logon KeyLogon ID, and Back-end Session Handle and is also included in the Memory per modes (SM04) snapshots.

Additional features in WP lists

When viewing a WP list, there are additional functions/buttons in SMON which are not present in /SDF/MON:

1. Statistical record (F9)

Selecting a row from the WP table and clicking this button takes you directly to transaction STAD and displays the STAT record (if it still exists) of the dialog step corresponding to the selected WP sample.

2. Show entries of the same dialog step

Selecting a row from the WP table and clicking this button will display all WP samples from that dialog step:

3. ABAP Workbench

Selecting a row from the WP table and clicking this button will display the program captured in the sample in the ABAP Workbench. Additionally, if a call stack is present for the selected sample the Workbench will open on the line of code recorded in the call stack.

Call Stack generation

Simply, /SDF/MON does not generate call stacks for WP samples.

Storage method

SMON stores the majority of its snapshot data transparently; contrast this with /SDF/MON which stores snapshot data as data clusters.

This allows us to query much of the data directly with SQL when more granular control over filtering or aggregation is required.

WP sample aggregation

SMON contains functions for aggregating WP samples. This allows us to identify trends in WP behavior (e.g. estimating what percent of runtime was spent waiting for DB commits).

While /SDF/MON appears to contain the same functionality, it does not seem to be implemented or is bugged; I have never been successful in using the WP aggregation function in /SDF/MON.

 

Tables Used by SMON

SMON uses many tables. Some tables contain administration/configuration data needed to run SMON; other tables store the snapshot data, aka SMON's "payload".

All tables are transparent-type tables per the ABAP dictionary (SE11), but some tables contain data clusters.

When the Storage Method is Direct, the data can be queried directly with SQL.

When the Storage Method is Data cluster then data in that table is stored as a data cluster and cannot be queried directly with SQL.

Table NameStorage MethodData stored is...Comments
/SDF/SMONData clusterSnapshotData such as Top 40 CPU Processes and Enqueue entries
/SDF/SMON_CLUSTData clusterAdmin/ConfigClustered/compressed snapshot data to reduce storage size
/SDF/SMON_CALLDirectAdmin/ConfigInformation needed to start the collectors via RFC
/SDF/SMON_CALLPAData clusterAdmin/ConfigInformation needed to start the collectors via RFC
/SDF/SMON_HEADERDirectSnapshotHeader data for each snapshot as seen in the Monitoring Data screen. E.g. CPU & Memory Consumption, Active WPs, EM allocated/free
/SDF/SMON_LAYOUTData clusterAdmin/ConfigALV grid/table layouts for viewing SMON data (in other words, what columns you see in the Monitoring Data table)
/SDF/SMON_RUNDirectAdmin/ConfigHeader information for each snapshot collection such as the content that is being collected and if the collection is currently clustered/compressed
/SDF/SMON_SESSDirectSnapshotContains session information (SM04); "Memory per modes" content type

/SDF/SMON_STACK

/SDF/SMON_STACKD

/SDF/SMON_STACKH

DirectSnapshot

Collected stack trace samples

/SDF/SMON_WPINFODirectSnapshotSnapshots of WP samples

 

SMON Overhead and Space Requirements

The CPU and memory requirements of SMON are mostly negligible and SMON can safely be run in production environments.

The only content for which collecting may place a non-negligible load on that part of the system are:

    • Enqueue entries
    • Inbound queues
    • Outbound queues

However, when the above content is enabled, there will be an explicit warning.

For example, turning on "Enqueue entries" will display the following message: 

For day to day monitoring, Enqueue entries, Inbound queues, and Outbound queues are not needed.


Space requirements should not be a concern due to the abundance of storage space in this day and age.

If storage space truly is a concern, the tables mentioned in the previous section can be monitored and if the size on disk of these tables is unacceptable, the [Interval in seconds] can be increased to reduce the volume of data collected.

Keep in mind that larger [Interval in seconds] will make analysis of short-lived performance problems more difficult.

Locks Used by SMON

It's normal to see long-lasting locks in SM12/SMENQ created by SMON.

When SMON collectors are started, they will obtains locks on the /SDF/SMON_CALL_KEY table. There will be 1 lock per application server (and one additional lock if one of the Global Content switches are enabled)

The SMON watchdog (job /SDF/SMON_WATCHDOG) which runs every 5 minutes checks the enqueue table for these locks.

If the locks are missing, it usually means the collectors are no longer running and the watchdog will restart the collectors.

For more information about these locks, see SAP Note 2689689 - /sdf/mon or /sdf/smon related locks remain in SM12

Deleting Large Snapshot Collections (Timeframe > 24 hours)

If for some reason a single snapshot collection covers more than 24 hours, do NOT immediately use the Delete button to remove it.

SMON tries to delete all records in /SDF/SMON_WPINFO that belong to a particular snapshot collection in a single DB transaction. SMON does not periodically commit during the delete.

As a result, if a particular snapshot collection contains an extremely large amount of data, trying to delete it directly may result in the DB's log buffers filling completely and preventing any transactions from being processed.


To avoid this problem:

    1. Truncate /SDF/SMON_WPINFO and /SDF/SMON_HEADER (the data in these tables are not system-critical)
    2. Then use the Delete button to remove the snapshot collection

All snapshots from all collections will be lost, but this prevents a possible log buffer full situation.

  • No labels