Page tree
Skip to end of metadata
Go to start of metadata

Purpose

The purpose of this page is to explain the ml_database lock warning within an SUP context, and how to move beyond this warning to start looking for the root cause of your RBS synchronization problem.

Overview

The contents will cover the behaviour, the cause of the warning and initial steps to identify the root cause.

Relevant versions

All versions of SUP/SMP 2.x.x using Native client applications and the RBS communication protocol through to port 2480 on the SUP server.

Understanding the warning

You may see the following error(s) in your SUP server log or mlsrv_err.log when client(s) cannot synchronize successfully or sync performance has degraded.

 

WARN Mobilink Thread-xxx [com.sybase.ml.sup.Logger] [10106] Unable to lock the remote ID '<SOME-GUID-VALUE>', will try again
WARN Mobilink Thread-xxx [com.sybase.ml.sup.Logger] [10050] ODBC: [Sybase][ODBC Driver][SQL Anywhere]User 'DBA' has the row in 'ml_database' locked (ODBC State = 40001, Native error code = -210)

  

Details of the errors

Warning 10106: Unable to lock the remote ID '%1', will try again

“The MobiLink server needs to lock the remote ID in order to detect concurrent synchronizations with the same remote ID. The server was not able to lock the given remote ID, but it will try to again.”

Warning 10050: Placeholder to return the Native error. In this case, Native error code -210

Native error code -210: User '%1' has the row in '%2' locked

“The operation failed because another connection holds a lock on the same object you are attempting to modify”

 

First let’s explain some background and the reason for the warning, then we can discuss the implications.

  • For the sake of this explanation, a synchronization request and client connection are the same thing and seen as unique.

  • Remember that a connection persists between the client and SUP for the duration of the sync request.

  • A Native application or client is the Ultralite (UL) database (*.udb or *.ulj(java)) in this context.

When a Native application is first registered with the SUP server, it’s uniquely identified across the mobile landscape by a GUID value that is stored locally. This GUID is the remote ID referred to in the warning message and is used in all synchronization communication between the client and SUP. These same GUID values are also stored in the ‘ml_database’ table in the cache database (default.db) of the SUP server.

SUP is designed to handle large numbers of concurrent sync requests across the mobile landscape, BUT it does not allow concurrent sync requests from the SAME native application (UL database). To manage this across potentially multiple Server nodes, SUP takes a row level lock in the ml_database table against the matching GUID when it starts to process a sync request. This lock is held for the duration of that individual sync request (upload & download), and is released once the download is ready to be sent to the client.

The warning will happen under the following circumstances

A client has successfully connected to SUP and the synchronization request is being processed (row lock for duration of upload & download). If during that time, the client loses the connection and makes a new connection, the second sync request will also try to take a lock against the same row in table ml_database. This results in the native -210 lock error and the Mobilink warning about not being able to get the required lock. We have detected 2 concurrent requests from the same UL client and so we will retry the lock for the second connection until the first connection releases the lock, at which point the second request can proceed.

Side note: If a connection is found to be closed once SUP is ready to deliver the download to that client, SUP simply discards it. The next connection to SUP from the same client will agree the last successful sync between the 2 parts and begin the new sync from there.

Implications:

1)

In an environment where longer sync times are expected, this warning could be more common in a mobile workforce. Consider a sync time of 3 minutes. (We’ll assume for this example that it’s due to the data volume of the sync and available coverage, and that the SUP/EIS processes have been fully optimised to keep the times to a minimum.)

 

The user is on a moving train when the sync is started. Shortly after this, signal is lost and the connection to SUP along with it. The user sees this and restarts the sync request once the signal returns. 2 minutes has elapsed in this time. The first request is still being processed when the second request is received by SUP, resulting in the warning. In this case users should be educated that they look out for a steady signal or be stationary during the sync, or wait a reasonable length of time before retrying the sync, say 5 minutes in this example.

2)

In truth sync times are often much faster, as quick as a few seconds, particularly once the initial sync is complete. The first request is likely to be finished before the second request has been received. When the warning is seen in these kinds of environments, synchronization problems will often be reported by multiple or all the users. In these cases, something has held up the first request of the client(s) and all the subsequent requests are being blocked. Ultimately there is a bottleneck at some point along the line from the SUP server to the EIS backend system which needs to be found and corrected. In the next section we’ll deal with some of the more common causes.

Finding the bottleneck

1) Use DBISQL to call the default stored procedures sa_conn_info() to check if a connection is blocking multiple synchronization requests.

SELECT * from sa_conn_info() where number <> connection_property('number')
// the WHERE clause here is excluding the DBISQL connection being used to run this query and is optional.

Look for any rows that have a ‘BlockedOn’ value that is not Zero. The BlockedOn value is the connection ‘Number’ of the record that is causing the block. Number is the first column returned and sometimes the blocks may be layered. E.g.: Connection 3 is blocked on Connection 2 that is blocked on Connection 1.

Once the root connection causing the block is identified, examine the row to get some idea of what it’s doing. What is the ‘LastReqTime’? What is the ‘ReqType’? Are there any ‘UncommitOps’?
Is the last request time unusually old? Does it have Zero uncommitted operations? Is its request type EXEC_ANY_IMM? These are indicators of a connection that is needlessly holding locks.
Check the locks that this connection is holding to see if SUP table locks are being held by this connection number.

SELECT * from sa_locks() where conn_id = 123456 
//where 123456 is the number of the problem connection from sa_conn_info.

Look at the output ‘table_name’ of the locks being held. If these are SUP MBO objects, this lock could be blocking the synchronization requests. Use the DROP CONNECTION statement to drop this connection and see if the sync requests begin to process normally.

DROP CONNECTION 123456;

COMMIT;
//ALWAYS call a COMMIT (or ROLLBACK) statement after any statements in DBISQL. A DBISQL connection against an SUP table can cause a block. If in doubt, disconnect DBISQL or Sybase Central from the SUP CDB.

 2) Use Mobilink Monitor to identify poor performing synchronizations

Mobilink Monitor is a graphical utility that is useful to identify performance bottlenecks in the synchronization process. If you haven’t already connected the Mobilink Monitor to SUP, please look at this quick tutorial on how to do this.

Connect to SUP Using MobiLink Monitor

NOTE: In a clustered SUP environment, the Monitor will only connect to one node at a time. Multiple nodes need multiple instances of the monitor running. The sync requests for each node monitored will coincide with the same SUP node server logs.

Run the Monitor while the synchronizations are performing badly. Identify the requests that are taking too long. Click on the individual request to drill down into detail. Most commonly, the time is taken in the ‘fetch_download’ phase of the sync process but this can lead to other indicators including longer times waiting on worker threads.

- If longer than expected fetch_downloads are visible, further investigation into the SUP server log to analyse the Dataservices component is needed.
- If fetches are normal but connections are waiting for available worker threads, then consider increasing the Replication Thread Count to increase worker threads in the SCC. The default is 10.
- If time is spent in the receive_upload and/or send_download phases, then look at the network communication.

3) Analysing Dataservices perfomance in the SUP server log

- If fetch_download is the cause of the performance time, increase the log level for Dataservices to DEBUG in the SCC and capture the sync performance problem.
- Use the Monitor output to identify a problem sync and then isolate that request in the SUP log. Use the sync request time, remote_id value and user id to identify the Dataservices thread ID that handled the request.
- Grep out the Dataservices request for that thread. The same thread id is reused multiple times within the SUP server so make sure that the timespan, remote id and user id match.
- The timeline of these requests will match the total time taken by the fetch_download phase. Look for big jumps in the timeline. The last line before the jump will be the action that consumed the time.

This is often a call to the backend system as a BAPI, stored procedure or WebService call. Investigation must move to this area and the performance analysed in the backend system using its available tools.

Occasionally the time will be used updating the SUP cache database. A similar process of performance investigation must be carried out in the CDB using its tracing tools to find out what activity(s) is causing the performance problem.

Conclusion

’ml_database’ lock messages that affect multiple users are most often caused by a block or performance bottleneck at the SUP server or the backend EIS system. There can be multiple reasons for this and the document isn’t intended to list them all, only to explain the cause and how to start looking for the root cause. Possible areas include the CDB performance, EIS performance, Hardware infrastructure between CDB and EIS and design of the Synchronization Model.

Opening a support incident:

 If a support incident is needed after reading this document, please provide the following information as a starting point.

- ML Monitor output showing the sync performance,
- SUP server log with Dataservices and MMS components set to DEBUG (In the SCC)
- Run this SQL statement 5 times at 2 minute intervals during the time the ml_database lock warnings are seen.


Begin
declare @time_as_string varchar(128);
declare @conn_info_file varchar(128);
declare @lock_file varchar(128);
 
set @time_as_string = DATEFORMAT( CURRENT TIMESTAMP, 'yymmddhhnnss' );
set @conn_info_file ='C:\\ ' || 'sa_conn_info_' || @time_as_string || '.txt';
set @lock_file ='C:\\ ' || 'sa_locks_' || @time_as_string || '.txt';
 
unload select * from sa_conn_info() where number <> connection_property('number') to @conn_info_file;
unload select * from sa_locks() where conn_id <> connection_property('number') to @lock_file;
commit;
end;


Include the saved Monitor output, the SUP logs and the text files created by this SQL block with the problem description.