BBQ has asked for the wisdom of the Perl Monks concerning the following question:

Note: Please check the original question placed in Q+A, with the title How to display from X to Y records using DBI?. I am moving the topic to Seekers of Perl Wisdom because I still have a few doubts (although the question was answered very well) and Q+A doesn't really provide a threaded discussion format.

OK, now that we got that out of the way... Thanks as usual guys (chromatic and btrott)! You're saviors! :o)

Back to chromatic's reply, I was wondering if the method
# we want 0 through 20 foreach my $row (@$array_ref[0 .. 20]) { # do something }
might be an overkill... I'll explain:
What if my execute() returns more than 50,000 rows. What would DBI do with them if I used the above mentioned? Would I still be populating the @$array_ref[] with 50,000 keys? Would there still be 50,000 rows of traffic between DBD and MySQL?

cheers!

Replies are listed 'Best First'.
Re: More on selecting rows with DBI...
by btrott (Parson) on May 02, 2000 at 08:52 UTC
    If your query is going to return a lot of rows, don't use that method. Yes: you asked for *all* of the rows (50_000 or whatever, using fetchall_arrayref), so DBI is going to fetch all of the rows. So now you've sent 50_000 rows between MySQL and yourself, and you've got a lot of data stored in memory. :)

    Which isn't too good a thing, meaning that, when you can, you should use the LIMIT offset,rows technique. This way, you're only ever sending rows rows between MySQL and your script.

Re: More on selecting rows with DBI...
by perlmonkey (Hermit) on May 02, 2000 at 09:07 UTC
    I would guess that DBI would process all 50000 records reguardless of what you do with the data. I am not posivite about that though. The safest bet for database performance is to always try to increase your query performance. (i.e. if you only need 20 rows, then only retrieve 20 rows). That way DBI will not have to do all the extraneous processing.

    In sybase there is a command "set rowcount NUMBER" which will only retrieve the first NUMBER rows. There may be something similar for the database you are using.

    So if you know you are only going to use the first 20 rows of a 50000 row dataset, then you might be able to limit it in your query:
    my $sql =<<EOF; set rowcount 20 SELECT * from total_huge_table EOF my $sth = $dbh->prepare($sql); $sth->execute(); ... more code here ...
    If this kind of query restriction is available via a perl module, I would guess that it would be mentioned in the individual DBD module perldoc pages. I dont think this could be a DBI feature since it is probably really database dependent.

    I just noticed that Mysql has a way to limit in the SQL also: SELECT * from huge_table LIMIT 1, 20This will return the 1st through 20th rows. (untested, but that is what the O'Reilly mysql book says: MySQL & mSQL by Yarger, Reese, & King)

    Well I dont think I answered you exactly, but maybe some of this will help.
      Actually, I'm sorta new to MySQL...

      I've used Oracle for the past years and there are a number of ways that this can be implemented on the database side (alot harder than LIMIT, BTW) by using PL/SQL. Now that I've been forced to use other DBMSs (MySQL being one of them), I'm trying to cut down to the least common denominator of SQL in order to maintain a certain degree of portability between the systems.

      DBI has been great as far as portability, but I have found that my Oracle vices have prooven to be rather disappointing at times, since DBD::Oracle allows you to do fancy Oracle proprietary stuff (like selects within selects for instance)...

      Oh, if only the DBMS vendors could agree on ONE PLAIN FLAVOR OF SQL it would make my life sooo much easier...
        Oh, if only the DBMS vendors could agree on ONE PLAIN FLAVOR OF SQL it would make my life sooo much easier...

        I think this will happen about the same time Microsoft voluntarily open sources and GPLs Windows along with porting Office to Linux.

        It is too bad though. Standard SQL would make life a lot easier for us all.

        In the mean time maybe you can keep all the database specific code in one module, and dynamically require that library depending on which DBD you are using. So that way you should really be able to utilize the power of each database, but still maintain a some-what portable app. Then you can create a database module for each database the app will be using.
RE: More on selecting rows with DBI...
by Jonathan (Curate) on May 02, 2000 at 14:06 UTC
    Just selecting the first n rows of a query is really a meaningless concept to a relational database. If you have to process 50,000 records the best way is use a cursor, so you access them a row at a time. Use $db_handle->prepare and $db_handle->fetchrow_arrayref to loop through the dataset. See perldoc DBI.pm
Re: More on selecting rows with DBI...
by athomason (Curate) on May 02, 2000 at 13:33 UTC
    Glancing through the cheetah book, there's three ways to get data from DBI: row methods (each time you call the method, it returns the next row), atomic methods (for returning only one row, doesn't help you), and batch methods (fetchall_arrayref(), selectall_arrayref(), like chromatic mentioned). The batch methods can indeed knock a system on its behind if the result set is big; it's certainly better to prepare and execute your SELECT, roughly make sure you can handle the results, then use fetchall_arrayref rather than selectall_arrayref. If you don't use all the data, though, this can be wasteful. If you're grabbing sequential sections of the result set (which it looked like from your question), I don't see a problem with just doing atomic fetches 20 times a pop when you need the next set.

    If you're using MySQL (I haven't used any other rdbms), LIMIT is a good option, like others have said. From the manual: "mysql> select * from table LIMIT 5,10; # Retrieve rows 6-15" looks like what you need. But unless the offset parameter of LIMIT can be a bound value (anyone know?), I think you'd take a performance hit for having the DB redo the query plan each time.

    A side note that bit me: if you want to use LIMIT with UPDATE, you'll need MySQL 3.23.

      Actually, I don't think the binding of the value is that much of an issue with MySQL. According to the Appendix in the DBI book, MySQL doesn't actually support bound variables--the use of placeholders is merely emulated by DBI. Can anyone who actually has the DBI book confirm this? :)
        The MySQL book (Paul DuBois) explicitly states that queries aren't cached, in the section on placeholders.

        He also says that you might as well use them, because if you port to another database, you might get the benefit there. I'd add that they make your queries easier to read and they do quoting for you automatically. Save yourself some trouble.

Re: More on selecting rows with DBI...
by Anonymous Monk on May 02, 2000 at 11:17 UTC
    I think what you need is 'limit':

    select fields from table where (something) limit 0,20;

    That will only send the first 20 fields to the script. Then you can say "limit 20,40" for the next fields.

      I am afraid that you are mistaken. The syntax that you have just provided:
      	select <something> from <somewhere> limit n, x
      
      	where n is 20 and x is 40
      
      will NOT give you rows 20 through 40... if you look at http://www.mysql.com/Manual_chapter/manual_Reference.html#SELECT in the mysql reference you will see that it states that the limit mechanism is:
      	limit offset, rows
      
      in order to receive rows 20 through 40 the proper limit syntax would have been:
      	limit 20, 20
      
      thank you.