Purpose
The purpose of this page is to clarify the understanding of how SAP ASE uses spinlocks and what the effects on overall CPU usage may be.
Overview
Often high CPU in SAP ASE can be traced to spinlock usage. This page will show how to identify that condition and suggest ways to tune ASE.
What is a Spinlock?
In a multi-engine server synchronization mechanisms are needed to protect shared resources
ASE uses spinlocks as one of its synchronization mechanisms
A spinlock is a data structure with a field that can only be updated atomically (that is, only one engine at a time can make changes to it).
When a task modifies a data item which is shared it must first hold a spinlock
Shared items are such things as run queues, data cache page lists, lock structures, etc.
The spinlock prevents another task from modifying the value at the same time
This of course assumes that the other task also performs its access under the protection of a spinlock
A task needing a spinlock will “spin” (block) until the lock is granted
When multiple engines are spinning at the same time CPU usage can rise substantially.
Spinlocks must be as fast and efficient as possible
In order to reduce contention a process which loops typically acquires and releases its spinlock each time through the loop.
Therefore the spinlock code is written in platform-specific assembly language.
Comparison of Spinlocks to Other Synchronization Mechanisms
Type | Complexity | CPU overhead | Wait time |
Spinlock | Low | High | Very low |
Latch | Moderate | Low | Should be small |
Table/page/row/address Lock | High | Low | Can vary considerably |
Spinlocks and CPU Usage
Spids trying to get a spinlock will never yield the engine until they have it.
So one spid, waiting on a spinlock, will cause 100% user busy on one engine until it gets the spinlock.
Spinlock contention percentage is measured as waits/grabs
Example: 10,000 grabs with 3,000 waits = 30% contention
For looking at performance issues, use total spins, not contention
Example: Assume two spinlocks
One had 100 grabs with 40 waits and 200 spins = 40% contention
Second had 100,000 grabs with 400 waits and 20,000 spins = 4% contention
The second used up more many cpu cycles spinning, even though contention was lower.
We should then look at tuning for the second example, not the first.
As more engines spin on the same spinlock, the wait time and number of spins increases; sometimes geometrically
Troubleshooting Spinlocks
Spinlock contention/spinning is one of the major causes of high CPU
Step 1 is determining if, in fact, the high cpu is being caused by spinlock usage.
Step 2 is determining which spinlock or spinlocks are causing the condition.
Step 3 is determining what tuning to use to help reduce the problem.
*Note* You will never get to 0% spinlock contention unless you only run with one engine. That is, do not think that spinlock contention can be eliminated. It can only possibly be reduced.
Step 1 - Checking for spinlock contention/spinning
Using sp_sysmon to determine if high cpu is due to spinlocks
Check “CPU Busy” (or “User Busy” in 15.7 Threaded Mode).
If engines are not showing high busy% then spinlocks are not a big issue.
Check “Total Cache Hits” in the “Data Cache Management” section.
If the cache hits per second is high, and goes up with cpu busy %, then you likely are looking at table scanning/query plans and not spinlocks.
In general, if cpu usage increases but measurements of throughput such as committed xacts, cache hits, lock requests, scans, etc. go down then it is very possible that spinlock usage is an issue
Step 2 - which spinlock or spinlocks are causing the contention?
Using sp_sysmon
There are several spinlocks listed, but only contention % is shown
Object Manager Spinlock Contention
Object Spinlock Contention
Index Spinlock Contention
Index Hash Spinlock Contention
Partition Spinlock Contention
Partition Hash Spinlock Contention
Lock Hashtables Spinlock Contention
Data Caches Spinlock Contention
High contention on any of these may indicate a problem
But, you may have contention on other spinlocks not reported in sp_sysmon
Using MDA table monSpinockActivity
This table was added in 15.7 ESD#2
Query using standard SQL.
One possible query showing the top 10 spinlocks by number of spins over a one-minute interval
select * into #t1 from monSpinlockActivity
waitfor delay "00:01:00"
select * into #t2 from monSpinlockActivity
select top 10 convert(char(30),a.SpinlockName) as SpinlockName,
(b.Grabs - a.Grabs) as Grabs, (b.Spins - a.Spins) as Spins,
(b.Waits – a.Waits) as Waits,
case when a.Grabs = b.Grabs then 0.00 else convert (numeric(5,2),(100.0 * (b.Waits - a.Waits))/(b.Grabs - a.Grabs))
end as Contention
from #t1 a, #t2 b where a.SpinlockName = b.SpinlockName
order by 3 desc
Possible Issues with monSpinlockActivity
Spinlocks with multiple instances will get aggregated
For example, all default data cache partition spinlocks will show up as one line
This can make it impossible to see if just one cache partition is causing the problem
You must set the 'enable spinlock monitoring' configuration variable
Tests show that this adds about a 1 percent overhead to a busy server.
monSpinlockActivity does show the current and last owner KPIDs. This can be useful to check if certain processes are the ones heavily hitting certain spinlocks.
Step 3 - what tuning to can be done to help reduce the problem
This is going to depend a great deal on which spinlock(s) the high spins are on.
Note as well that it is quite possible to reduce contention on one spinlock only to have it increase on another
Some of the more common spinlocks and possible remedies
Object Manager Spinlock (Resource->rdesmgr_spin)
Make sure that sufficient ‘number of open objects’ have been configured.
Identify ‘hot’ objects by using monOpenObjectActivity. The Operations columns counts number of access (open) of a table.
Use dbcc tune (des_bind) to bind the hot objects to the DES cache.
The reason this works is that the spinlock is used to protect the DES keep count in order to make sure an in-use DES does not get scavenged. When the DES is bound that whole process gets skipped.
Data Cache spinlocks
The best single method to reduce data cache spinlock usage is to increae the number of partitions in the data cache.
Note that if a cache can be set to ‘relaxed LRU’ the spinlock usage may be decreased dramatically. This is because the relaxed LRU cache does not maintain the LRU->MRU chain, and so does not need to grab the spinlock to move pages to the MRU side.
There are definite requirements for this to help (a cache that has high turnover is a very poor candidate for relaxed LRU).
Procedure Cache Spinlock (Resource->rproccache_spin)
This spinlock is used when allocating or freeing pages from the global procedure cache memory pool (this includes statement cache).
Some possible causes include
Proc cache too small – procs and statements being frequently removed/replaced.
Procedure recompilations
Large scale allocations
To reduce pressure on the spinlock
Eliminate the cause(s) for procedure recompilations (maybe TF 299)
If you are running a version prior to ASE 15.7 ESD#4 upgrade. ASE 15.7 ESD#4 and 4.2 have some fixes to hold the spinlock for less time
Trace flags 753 and 757 can help reduce large-scale allocations
In ASE versions past 15.7 SP100, use the configuration option "enable large chunk elc“.
Use dbcc proc_cache(free_unused) as temporary help to reduce spinlock/cpu usage.
Procedure Cache Manager Spinlock (Resource->rprocmgr_spin)
This spinlock is used whenever moving procedures and dynamic SQL into or out of procedure cache.
This spinlock was also used prior to ASE 15.7 ESD#1 when updating the memory accounting structures (pmctrl).
Due to contention a separate spinlock was created.
Causes of high contention include:
Heavy use of dynamic SQL
Procedure cache sized too small
Possible remedies are the same as for rproccache_spin
Lock Manager spinlocks (fglockspins , addrlockspins, tablockspins)
These spinlocks are used to protect the lock manager hashtables.
If the lock HWMs are set too high, that means more locks and more contention
Configuration tunables are the primary way to address this
lock spinlock ratio
lock address spinlock ratio
lock table spinlock ratio
lock hashtable size
What not to do
Resist the urge to add more engines because cpu is high
Adding additional engines when the high cpu busy is caused by spinlock contention will only make matter worse
Adding more "spinners" will simply increase the amount of time it takes each spid to obtain the spinlock, slowing things doen even more.
__________________________________________________________________________________________________________
17 Comments
Rao Bheemarasetty
Nice article on spinlocks Dave, very intuitive.
Former Member
Excellent information. Really helped us identify contention on procedure cache.
Is there a document listing all the locks, we've also seen high contention on
Kernel->kaspinlock
Resource->rpssmgr_spin
SSQLCACHE_SPIN
Thanks
Rao Bheemarasetty
Hello,
Kernel->kaspinlock
This spinlock is associated with kernel alarm. Spinlocks with this structure is rare in nature.
Resource->rpssmgr_spin
PSS(Process Stack Structure), this is the primary structure that is associated to a process when a SAP ASE receives a connection from client. Since this being a structure and handles memory, a spinlock associated with this structure is called pss_spin. Since there are multiple processes involved so a control mechanism is required called PSS Manager hence rpssmg and the associated locking mechanism(spinlock) is called rpssmgr_spin.
SSQLCACHE_SPIN
The name suggests this locking mechanism is associated with statement cache. This is required to avoid contention among similar profiled statement from various processes.
Former Member
Thanks - that helps a lot.
Its a shame more of the information isn't better documented - but thanks to Dave for producing this.
Artem Maystrenko
hi,
Maybe someone know what it is Resource->rpmctrl_spin.
Because I had this spin on my sysmon(70-90%)
But I can't find information about it spin
Rao Bheemarasetty
Hello,
The rpmctrl is a structure, used to track memory allocated from procedure cache. If spins on this structure shows high values then there is a contention in procedure cache. There may be number of reasons for such high values but at a high level the dbcc procedure dbcc proc_cache('free_unused') may help in reducing/eliminate the spins associated on this structure.
Artem Maystrenko
Thanks a lot))
This information strongly is useful to me.
Maybe there is a document where all possible spin are described?
Rao Bheemarasetty
I'm sorry to say that there is no standard document to explain various spinlocks and associated structures. Hence David(this document original writer) made some good efforts to collect and catalog them. Also I scourge the code and came with that explanation.
H.T.H.
Former Member
Another query I have is at what point should we be concerned about the Contention ?
I appreciate that this will depend on the number of engines, speed of engines, etc.
eg I have this
SpinlockName Spinlock SlotID Grabs Spins Waits Contention
------------------------------ -------------- -------------------- -------------------- -------------------- ----------
Resource->rproccache_spin 1401 13780565 330466267 3492329 25.34
Is 25% contention with 3,492,329 Waits considered high over a 20second period.
We have 20 engines so we're getting 8730 waits per engine per second.
It each spinlock is 0.1ms then that equates to 0.8s per engine per second - which isn't too bad but high for a 20s period (4% of the time)
But if a spinlock is 1ms then that equates to 8s per engine per second wich is very high.
I appreciate the time will vary on many factors but what sort of figure is the point where we should be concerned.
How long is the low"Wait Time" ? (I know we can't say for sure as it depends on the machine)
Former Member
If this really is a 20 second sample, then 330,466,267 spins is a very real problem. I suggest increasing procedure cache by, say, 25% and to open a case with support to check what options that are available in your version. (depending on version this may be a trace flag or may be a configuration parameter).
Cheers,
/Stefan
Former Member
Stefan,
Yes, we logged a case.
We "fixed" it by setting traceflags 753, increased the procedure cache and set the ELC percent at 80%.
The more useful option was trace flag 753.
There's a lot of more detail here ...
http://scn.sap.com/thread/3815278
What I don't understand is that TF 753 forces 2k pages to be grabbed. Why isn't there an option to always grab 16k pages as well.
2k is such a small amount of memory.
Surely we want to grab 8k, 16k or 32k at a time. We've upgraded our server to use 16k pages, why are we still using 2k pages for procedure cache ?
This is 2015 after all not 1990. We have servers with 0.5Tb and 1Tb RAM.
Former Member
2kB has nothing to do with your data page size. It is the size of a memory page.
Former Member
Yes - I'm aware 2k of page size is different from 2k of memory page.
We've upgrade our server to have 2k page sizes, and without trace flag 753 the procedure cache is grabbed in blocks of 2k,4k or16k (is there an 8k ?)
TF753 will then force the procedure cache to grab all its memory in 2k chunks. It would be useful to be able to force all the pages to say 4k or 16k. Why only 2k ?
The point I was making is that both of these sizes are very small in terms of todays computers.
Rao Bheemarasetty
Mike,
I understand where your are coming from. You mean you want a 16kib page instead of 2kib page, when you configure the ASE to 16Kib page size.
The design of ASE cache pages use only 2Kib. There are so many technical reasons behind this decision. One main reason is cache latency with larger page sizes.
Even though the pages are 2Kib but scales with multiples of 2Kib.
Hope this helps.
Former Member
Yes I appreciate the increased latency of getting 16k cache page versus a 2k cache page.
However, any process requiring more than 2k will therefore have to grab a 2k page multiple times and ASE has approached this issue by allowing a grab of more than 2k pages (which TR 753 turns off). This makes sense. However, grabbing different sizes seems to cause fragmentation (at least for us) so we have to turn this performance improvement off.
If (and I mean if) most of our procs need a minimum of, say 32k, to run, and forcing ASE to only allow 2k page grabs then most procs needs to grab pages 16 times. If we force the grab size to be say 4k, then the smaller procs will have a small amount of wastage but the larger more intense procs will reduce the number of grabs.
I have no idea how the performance would vary, but since SAP have implemented grabbing pages of different size, I'd guess its beneficial or have SAP got this wrong ?
So in the same way, that ASE allows different size disk pages, then why not allow the setting of the page size. We could try 2k, 4k and 16k and choose whats best for our application.
Although, the Linux memory page size is 4k so I would have expected the optimum ASE page size to match that in much the same way, we get optimum disk performance when the ASE page size matches the SAN page size (does any disk have 2k blocks these days ?) But obviously there's a lot of information I don't know about the ASE internals so perhaps there's a reason for this.
Former Member
In Sybase 16 - it seems trace flag 753 has no impact and you need to use the configuration setting "enable large chunk elc" - set this to zero to have the same impact as 753
Eisen Wang
Greate Article!
Thank you very much.