in reply to Moving SQL from perl to Stored Procedures

Paging is difficult in SQL Server 2000 for the following reasons:
Here are a few common approaches:
1) Nested tops.
This results in sorting the data 3x, and requires dynamic generation.
2) select the top @slice_size, that aren't in the set (1..@slice_size-1)
This performs badly because of the subselect, and requires dynamic generation
3) Select based on one or several sort columns.
This solution performs well, but becomes difficult with dynamic sorting
4) Do a self-join and select based on the count() of rows that follow this one.
Very inefficient for large datasets

A few useful things to remember:
With these things in mind I came up with the following strategies:

Strategy1:

 For start 5, slice size 10, end 15:
   Select the results from 1..15 into @paging_buffer
   Assign a sort_idx to each row using an identity column
   Select the rows with a sort_idx > @slice_start

Strategy 2

 For start 5, slice size 10, end 15:
   Select the results from 1..15 into @paging_buffer
   Set the ROWCOUNT to @slice_start -1
   Delete from @paging_buffer
Strategy 1 outperforms strategy 2 by a fair amount, so I would recommend that one - thanks to Jenda for pointing out a flaw in my benchmarking.

An important note with strategy 2 is that if the start_slice = 1 you do not want to set the ROWCOUNT to 0 before deletion, as this will purge all the records from the buffer. This is bad.

An improvement over this design is to only store the keys required to uniquely identify the rows in the paging buffer, and then join the paging buffer back to the original tables. This reduces the amount of space used in the temp db.

If your query uses multiple tables then solution 2 will require join hints to force the join order, and to force a loop join. Hash/Merge joins will break the ordering of the result set, as will a different join order.

Example implementation for strategy 1:

CREATE PROCEDURE [dbo].[paging_example] @slice_size int = 0, -- ROWCOUNT 0 will return all results @slice_start int = 1 AS declare @slice_end int set @slice_end = @slice_start + @slice_size - 1 -- Create a buffer for the records from 1..@slice_end declare @paging_buffer table( sort_id int not null identity(1,1) PRIMARY KEY, user_id int not null ); -- SELECT from 1 til the end of the slice into @paging_buffer SET ROWCOUNT @slice_end insert into @paging_buffer (user_id) SELECT user_id FROM Users ORDER BY lastname SET ROWCOUNT 0 select u.user_id, u.firstname, u.lastname from @paging_buffer buffer inner join Users u on u.user_id = buffer.user_id where buffer.sort_id >= @slice_start and buffer.sort_id <= @slice_end order by buffer.sort_id GO

Example implementation for strategy 2:

CREATE PROCEDURE [dbo].[paging_example] @slice_size int = 0, -- ROWCOUNT 0 will return all results @slice_start int = 1 AS -- Create a buffer for the records from 1..@slice_end declare @paging_buffer table( user_id int ); declare @slice_end int set @slice_end = @slice_start + @slice_size - 1 -- SELECT from 1 til the end of the slice into @paging_buffer SET ROWCOUNT @slice_end insert into @paging_buffer SELECT user_id FROM Users ORDER BY lastname --This approach will delete the rows we don't need from the buffer -- If we're starting at #1 then we don't need to do anything special -- Delete will go in order, so if we delete up to the record before th +e start of the slice then only the desired records will remain DECLARE @discard_up_to int SET @discard_up_to = @slice_start - 1 SET ROWCOUNT @discard_up_to -- If the start_slice is 1, then setting the rowcount to 0 will result + in a complete delete - we don't want this -- But we also do not want to prevent if with an IF clause, because wh +en a stored procedure uses the 'IF' clause it will be recompiled befo +re each run. -- So instead we check "where @discard_up_to > 0" in the delete clause DELETE FROM @paging_buffer where @discard_up_to > 0 SET ROWCOUNT 0 select u.user_id, u.firstname, u.lastname from @paging_buffer buffer inner join Users u on u.user_id = buffer.user_id option (loop join, force order) GO

Update - Jenda pointed out a flaw in my benchmark. Updated accordingly.

Update - Added implementation using strategy #1

Replies are listed 'Best First'.
Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by dokkeldepper (Friar) on Sep 09, 2006 at 08:22 UTC
    Use cursors! Update: BTW: These dynamic where clauses are very expensive because the optimizer cannot optimize them. Better way: Generate SQL-Statements with a fixed filter, and execute them from within a procedure. This is much more performant. There was an article in SQL - mag in the black belt sql column recently. Maybe I should meditate someday about "SQL-cursors for the unwary" or something like that.

      You mean build an SQL statement in a nvarachar variable and then execute it? Well well well ... this has two big problems. The first being that this way you force the server to generate a query plan for each execution which may be a rather expensive operation. And the second, even more important, is that this is prone to SQL injection attacks. And it's even more dangerous than SQL statements built in the pages/scripts. First because it's kinda hidden in something that's by most believed to be safe and second because the statement is executed with the permissions of the stored procedure owner (usually dbo!) instead of the permissions of the user accessing the database!

      I can't say I never use this, but I do try to take this as a last resort solution. It's much better, even if not very convenient to branch out and write several SELECT statements. That way the server may generate and REMEMBER the query plans for each of the cases and there is nothing ad-hoc generated and therefore dangerous:

      CREATE PROCEDURE [dbo].[dynamic_example] @lastname varchar(50) = NULL, @firstname varchar(50) = NULL AS BEGIN IF (@lastname IS NULL) IF (@firstname IS NULL) SELECT * from Users ELSE SELECT * from Users WHERE (users.firstname LIKE @firstname) ELSE IF (@firstname IS NULL) SELECT * from Users WHERE (users.lastname LIKE @lastname) ELSE SELECT * from Users WHERE (users.lastname LIKE @lastname) AND (users.firstname LIKE @firstname) END GO
      I know it's inconvenient, especially as the number of conditions increases, but it's the safest and most efficient solution. It's a shame I can't use some macros in SQL :-(

Re^2: Moving SQL from perl to Stored Procedures - MSSQL tricks, dynamic paging
by Jenda (Abbot) on Sep 09, 2006 at 10:46 UTC

    Could you show me how do you assign the sort_id in your first strategy? I don't have access to an SQL Server at the moment, but the difference in the speed of those two strategies sound too big. Do you use

    declare @paging_buffer table( sort_id int not null identity(1,1) PRIMARY KEY, user_id int not null, firstname varchar(50), lastname varchar(50) );
    ?

      I suspect that I was a fool when I tested that method, and that the sort_id is a perfectly reasonable approach. The dynamic paging text is from something I wrote awhile, and I didn't retest my conclusions - my apologies.

      I'll correct the benchmark data provided and add a note about your correction to the end - thanks for catching my error.

      Update - Redid benchmark

      The sort_id approach works better. Thanks again for pointing out the error.