Recently I moved a database used
by Vcenter application, to a new instance, this instance had 2 GB RAM. AS part
of the move there were jobs that had to be moved as well.
Three jobs in particular started
failing immediately thereafter and the message logged in job history was as
below:
“The instance of
the SQL Server Database Engine cannot obtain a LOCK resource at this time.
Rerun your statement when there are fewer active users. Ask the database
administrator to check the lock and memory configuration for this instance,
or to check for long-running transactions. [SQLSTATE HY000] (Error 1204). The step failed.”
|
I had not observed this on the
earlier server, and that instance had up to 6 GB memory. And so, I increased
the RAM to 3 GB, on the new instance. Then, post restart the problem seemed to
resolve. But next day the issue was back.
I have read that up to 60% of the
available memory is the maximum that SQL server will use for locks. So I tried checking the amount of memory used
by the lock manager, using PERFMON counter lock memory (KB) and it read 1.7 GB,
which was nearly 57 % of the total 3 GB available to the instance. But then, before
increasing the RAM up to 6 GB, as it was in the old server, I wanted to check the
DML fired by the jobs.
I reviewed each of the jobs, and
these called stored procedures, all three are similar, used to purge historical
data, (3 jobs for 4 separate tables, all structurally same)
The section of code in the procedure is below:
BEGIN
TRANSACTION
DELETE
FROM
VPX_HIST_STAT2
WITH (ROWLOCK)
WHERE
TIME_ID IN (SELECT TOP (@batchsize)
TIME_ID FROM
VPX_HIST_STAT2
WHERE NOT EXISTS ( SELECT 1 FROM
VPX_SAMPLE_TIME2
VTD
WHERE
VTD.TIME_id=VPX_HIST_STAT2.time_id ))
SET
@deletedHistStatsCount = @@ROWCOUNT
COMMIT
TRANSACTION
|
I checked for the total number of
records and found table- VPX_HIST_STAT1 had over 10 million rows, similarly
other jobs deleted rows from tables similarly structured, with almost as many
rows VPX_HIST_STAT2, VPX_HIST_STAT3, VPX_HIST_STAT4
I noticed on these tables there
was not a usable index which the DELETE’s WHERE clause (in this case, column
name TIME_ID) could make use of.
The column name TIME_ID was part
of the composite PK clustered index on the table, but this column was the
second column, and was therefore of no use to query.
I decided to add a non clustered
index on time_id column, and wanted to analyze the query using database tuning
advisor. As expected, it suggested the same index.
The section of code is below:
CREATE
NONCLUSTERED INDEX [NIDX_TIME_ID] ON [dbo].[VPX_HIST_STAT2]
(
[TIME_ID] ASC
)
INCLUDE
([COUNTER_ID],
[STAT_VAL])
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE
= OFF) ON [PRIMARY]
|
I created the required index, re-ran the jobs,
all success and didn’t require a memory increase as I initial would have wanted
to do.
Another point I would like to
mention here is that each of the delete DML included a rowlock query hint. The
delete query affecting a table with nearly 10 million, meant the rowlock would
request that many rowlocks, causing a high number of locks and hence an increase
in the lock memory. Under normal cases, with no rowlock query hint, by default SQL
server would request a rowlock, but when it figures a large part of the table
is being affected, higher level locks, like page lock or table lock are
requested ( known as lock escalation ). Note, that lock
escalation can be an issue in certain other circumstances. In this case, I
felt, with a delete affecting so many rows, rowlock hint was not required;
rather allowing SQL Server to decide the type of lock was a much better choice.
In fact, I want to test this
scenario on the old server. Reduce the memory from 6 GB to 2 GB, not create the
new index, and rather eliminate the rowlock query hints. Hopefully, I can do it
sometime soon and validate my understanding and resolution further.
This issue is also covered under an article at http://support.microsoft.com/kb/323630