Re: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

Paging is difficult in SQL Server 2000 for the following reasons:

You cannot use a variable with TOP
You cannot specify an offset with TOP

Here are a few common approaches:

1) Nested tops.

Extract 1..slice_end ORD ASC
select 1..slice_size ORD DESC
select ORD ASC

This results in sorting the data 3x, and requires dynamic generation.

2) select the top @slice_size, that aren't in the set (1..@slice_size-1)

This performs badly because of the subselect, and requires dynamic generation

3) Select based on one or several sort columns.

Set ROWCOUNT to @slice_start, record the values for each sort column (must result in a unique entry)
Set ROWCOUNT to @slice_size, and set the WHERE clause to use the values recorded in the previous step

This solution performs well, but becomes difficult with dynamic sorting

4) Do a self-join and select based on the count() of rows that follow this one.

Very inefficient for large datasets

A few useful things to remember:

'SET ROWCOUNT' does allow variables.
You can use 'SET ROWCOUNT' to restrict the number of rows affected by INSERT, UPDATE, DELETE

With these things in mind I came up with the following strategies:

Strategy1:

 For start 5, slice size 10, end 15:
   Select the results from 1..15 into @paging_buffer
   Assign a sort_idx to each row using an identity column
   Select the rows with a sort_idx > @slice_start

Strategy 2

 For start 5, slice size 10, end 15:
   Select the results from 1..15 into @paging_buffer
   Set the ROWCOUNT to @slice_start -1
   Delete from @paging_buffer

Strategy 1 outperforms strategy 2 by a fair amount, so I would recommend that one - thanks to Jenda for pointing out a flaw in my benchmarking.

An important note with strategy 2 is that if the start_slice = 1 you do not want to set the ROWCOUNT to 0 before deletion, as this will purge all the records from the buffer. This is bad.

An improvement over this design is to only store the keys required to uniquely identify the rows in the paging buffer, and then join the paging buffer back to the original tables. This reduces the amount of space used in the temp db.

If your query uses multiple tables then solution 2 will require join hints to force the join order, and to force a loop join. Hash/Merge joins will break the ordering of the result set, as will a different join order.

Example implementation for strategy 1:

CREATE PROCEDURE [dbo].[paging_example]
  @slice_size int = 0, -- ROWCOUNT 0 will return all results
  @slice_start int = 1
AS

declare @slice_end int
set @slice_end   = @slice_start + @slice_size - 1

-- Create a buffer for the records from 1..@slice_end
declare @paging_buffer table(
  sort_id int not null identity(1,1) PRIMARY KEY,
  user_id int not null
);


-- SELECT from 1 til the end of the slice into @paging_buffer
SET ROWCOUNT @slice_end
insert into @paging_buffer (user_id)
SELECT 
  user_id
FROM Users
ORDER BY lastname

SET ROWCOUNT 0
select u.user_id, u.firstname, u.lastname
from @paging_buffer buffer
inner join Users u on u.user_id = buffer.user_id
where buffer.sort_id >= @slice_start 
and buffer.sort_id <= @slice_end
order by buffer.sort_id

GO
[download]

Example implementation for strategy 2:

CREATE PROCEDURE [dbo].[paging_example]
  @slice_size int = 0, -- ROWCOUNT 0 will return all results
  @slice_start int = 1
AS

-- Create a buffer for the records from 1..@slice_end
declare @paging_buffer table(
  user_id int
);


declare @slice_end int
set @slice_end   = @slice_start + @slice_size - 1

-- SELECT from 1 til the end of the slice into @paging_buffer

SET ROWCOUNT @slice_end
insert into @paging_buffer
SELECT 
  user_id
FROM Users
ORDER BY lastname

--This approach will delete the rows we don't need from the buffer
-- If we're starting at #1 then we don't need to do anything special
-- Delete will go in order, so if we delete up to the record before th
+e start of the slice then only the desired records will remain

DECLARE @discard_up_to int
SET @discard_up_to = @slice_start - 1
SET ROWCOUNT @discard_up_to

-- If the start_slice is 1, then setting the rowcount to 0 will result
+ in a complete delete - we don't want this
-- But we also do not want to prevent if with an IF clause, because wh
+en a stored procedure uses the 'IF' clause it will be recompiled befo
+re each run.
-- So instead we check "where @discard_up_to > 0" in the delete clause

DELETE FROM @paging_buffer where @discard_up_to > 0

SET ROWCOUNT 0
select u.user_id, u.firstname, u.lastname
from @paging_buffer buffer
inner join Users u on u.user_id = buffer.user_id
option (loop join, force order)

GO
[download]

Update - Jenda pointed out a flaw in my benchmark. Updated accordingly.

Update - Added implementation using strategy #1

Comment on Re: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging Select or Download Code

Replies are listed 'Best First'.

Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by dokkeldepper (Friar) on Sep 09, 2006 at 08:22 UTC

Use cursors! Update: BTW: These dynamic where clauses are very expensive because the optimizer cannot optimize them. Better way: Generate SQL-Statements with a fixed filter, and execute them from within a procedure. This is much more performant. There was an article in SQL - mag in the black belt sql column recently. Maybe I should meditate someday about "SQL-cursors for the unwary" or something like that.

[reply]

Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by Jenda (Abbot) on Sep 09, 2006 at 10:32 UTC

You mean build an SQL statement in a nvarachar variable and then execute it? Well well well ... this has two big problems. The first being that this way you force the server to generate a query plan for each execution which may be a rather expensive operation. And the second, even more important, is that this is prone to SQL injection attacks. And it's even more dangerous than SQL statements built in the pages/scripts. First because it's kinda hidden in something that's by most believed to be safe and second because the statement is executed with the permissions of the stored procedure owner (usually dbo!) instead of the permissions of the user accessing the database!

I can't say I never use this, but I do try to take this as a last resort solution. It's much better, even if not very convenient to branch out and write several SELECT statements. That way the server may generate and REMEMBER the query plans for each of the cases and there is nothing ad-hoc generated and therefore dangerous:

CREATE PROCEDURE [dbo].[dynamic_example]
  @lastname varchar(50) = NULL,
  @firstname varchar(50) = NULL
AS
BEGIN
  IF (@lastname IS NULL)
    IF (@firstname IS NULL)
      SELECT * from Users
    ELSE
      SELECT * from Users
       WHERE (users.firstname LIKE @firstname)
  ELSE
    IF (@firstname IS NULL)
      SELECT * from Users
       WHERE (users.lastname LIKE @lastname)
    ELSE
      SELECT * from Users
       WHERE (users.lastname LIKE @lastname)
         AND (users.firstname LIKE @firstname)
END
GO
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by Jenda (Abbot) on Sep 09, 2006 at 10:46 UTC

Could you show me how do you assign the sort_id in your first strategy? I don't have access to an SQL Server at the moment, but the difference in the speed of those two strategies sound too big. Do you use

declare @paging_buffer table(
  sort_id int not null identity(1,1) PRIMARY KEY,
  user_id int not null,
  firstname varchar(50),
  lastname varchar(50)
);
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re^3: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging

by imp (Priest) on Sep 09, 2006 at 16:05 UTC

I'll correct the benchmark data provided and add a note about your correction to the end - thanks for catching my error.

Update - Redid benchmark

[reply]