Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

DBD::SQLite dilemma

by Matts (Deacon)
on Aug 11, 2003 at 09:00 UTC ( [id://282791] : perlmeditation . print w/replies, xml ) Need Help??

It's always bugged me that DBD::SQLite used the sqlite_get_table() function for data access. This function returns the entire result set as a char ** - i.e. all the data in one long array of strings that I then have to iterate over in calls to $sth->fetch. This works fine, as users of DBD::SQLite will testify, but it means that it uses up lots of memory for large result sets.

It also didn't give me back much information about the results (except the column names and number of rows). In order to get back the data types of the columns (so I could implement $sth->{TYPE}) I'd have to use the sqlite_step() function, which is more DBI-like - an iterator.

Now the downside (after I'd spent hours converting all the code over to using sqlite_step()) is that the tests no longer pass because doing it this way doesn't give me the number of rows coming back from a SELECT. I get the number of rows from an update/insert/delete just fine though.

i.e. I've broken my $rowcount = $dbh->do("SELECT * FROM TABLE");

Now the big question becomes what do I do with these changes?

  1. Back them out - we can't have backward incompatible changes for what is my most downloaded/used module
  2. Add lots of extra code to provide the option of one interface or the other
  3. Just go with it, and inform people who complain about it how to deal with the error
What do the perlmonks collective think?

Replies are listed 'Best First'.
Re: DBD::SQLite dilemma
by adrianh (Chancellor) on Aug 11, 2003 at 09:44 UTC

    I'd go for option (3) myself and call

    my $rowcount = $dbh->do("SELECT * FROM TABLE");

    undefined behaviour (mainly coz I'd like iterators:-)

    The benefit is worth the gain - and people should learn how to use COUNT().

      I second this idea (... on the other hand, maybe Matts can hack sqlite to add such functionality -- naah).

      update: `perldoc DBI' says "For SELECT statements... The execute method does not return the number of rows that will be returned by the query (because most databases can't tell in advance), it simply returns a true value.". So, since DBI users are not supposed to rely on this, they shouldn't ;)

      MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
      I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
      ** The third rule of perl club is a statement of fact: pod is sexy.

        Indeed. It's expensive to return the count of a select because you either have to count the number of responces you get back, or run a count(*) on the same querey that you're executing, which can be as expensive as running the query itself.

Re: DBD::SQLite dilemma
by dbwiz (Curate) on Aug 11, 2003 at 12:31 UTC
    i.e. I've broken my $rowcount = $dbh->do("SELECT * FROM TABLE");

    That was sort of broken from the beginning, I'm afraid. As PodMaster notes, the DBI does not guarantee that a method returns a row count for a SELECT, so using $dbh->do to get a row count doesn't seem to be kosher.

    I would go for option number 2, trying to mimic DBD::mysql approach, where you have "store_result" (when you get all the records at once) or "use_result" (when you get the records one-by-one). You may leave the old method as behavior, so existing code would not be broken, and the new method as optional for each statement handler (e.g. $sth= $dbh->prepare($query, {sqlite_use_result => 1})).

    Getting the numbers of rows in this scenario would follow the DBI standards. If you use selectall_arrayref or fetchall_arrayref, the number of rows is given by the resulting array size. Using the fetchrow_* methods, you need to count manually, the same way you do it for other drivers.

    Concerning the suggestion of integrating the SELECT call with a separate COUNT(*) statement, I would rather not do that, since it is a kind of intrusive coding that is difficult to control. For example, given this query

    SELECT field1, field2, SUM(field3) FROM table1, table2, table3 WHERE table1.id1 = table2.id1 AND table2.id2=table3.id2 GROUP BY field1, field2

    How would you modify it to get a count? You must get the count of the aggregated columns, but if this is an expensive query, the count would be almost as expensive as the original query. The same is true for WHERE clauses with heavy calculations. In this case, I would advise to leave such tricks to the users imagination.

Re: DBD::SQLite dilemma
by mpeppler (Vicar) on Aug 11, 2003 at 12:36 UTC
    I'd go with your third option, because it doesn't preclude you from providing the correct answer.

    The current version of DBD::Sybase returns -1 for

    $rowcount = $dbh->do("select * from TABLE");
    but I jusr verified that a (very) simple change will let it return the correct number of rows.

    DBD::Sybase uses it's own implementation of do() to handle multiple result sets, but if DBD::SQLite correctly handles the rows() then you could do something like this:

    sub do { my($dbh, $statement, $attr, @params) = @_; my $sth = $dbh->prepare($statement, $attr) or return undef; $sth->execute(@params) or return undef; return undef if $sth->err; while(my $dat = $sth->fetch) { return undef if $sth->err; } my $rows = $sth->rows; ($rows == 0) ? "0E0" : $rows; }
    Also note the comments by others in this thread that DBI doesn't guarantee that the rowcount is available for SELECT operations, so IMHO you're off the hook :-)


Re: DBD::SQLite dilemma
by mirod (Canon) on Aug 11, 2003 at 10:27 UTC

    Ever the conservative one, I would go for (2), with (1) being the second choice:

    As you put it nicely: we can't have backward incompatible changes. If nobody has complained so far (has anybody complained? I don't see anythink on it probably means that it is not really a problem for users.

    I see an other option BTW: (4) (possibly optionaly), internally do a SELECT count(<query>) FROM... to get the info first, then do the query (all nicely wrapped in a transaction I would assume).

    Maybe contacting the author of SQLite to get a function that would return the count, before using sqlite_step might also be a good idea, as it would move the extra step out of Perl (and out of your module) and into the core library.

Re: DBD::SQLite dilemma
by Juerd (Abbot) on Aug 11, 2003 at 11:02 UTC

    3. Just go with it, and inform people who complain about it how to deal with the error

    That one.

Re: DBD::SQLite dilemma
by dws (Chancellor) on Aug 11, 2003 at 17:36 UTC
    3. Just go with it, and inform people who complain about it how to deal with the error

    Another vote for door number three.

    Those of us who deal with different databases are unlikely to expect

    my $rowcount = $dbh->do("SELECT * FROM TABLE");
    to be portable, and will opt for COUNT(*) (or the equivalent) instead. And those who haven't yet experienced the joys of porting database applications to different RDBMs are getting set up for an unpleasant surprise when they use that construct.

Re: DBD::SQLite dilemma
by runrig (Abbot) on Aug 11, 2003 at 16:52 UTC
    From the DBI docs:
    "do" Prepare and execute a single statement. Returns the number of rows affected or "undef" on error.
    I don't think of SELECT'ed rows as being affected (isolation mode of repeatable read aside), so I wouldn't worry about the new behavior. Updates and deletes affect rows, so as long as the number of rows affected are returned there, it'd be ok by me.
Re: DBD::SQLite dilemma
by liz (Monsignor) on Aug 11, 2003 at 10:18 UTC
    4. do something special if $dbh->do is called in non-void context.

    If called in void context, keep the current iterator way. If called in a non-void context, fetch all the records in memory and return number of records fetched.

    Document that calling $dbh->do in non-void context will load all records in memory.

    This should keep the old behaviour (which was loading all records already) and provide a new, compatible way of fetching records that uses less memory.


    Another idea: recreate the query using COUNT() instead of the selected fields if $dbh->do is called in non-void context. Return the result of that query as the number of records. This would allow you to keep the current memory friendly behavior even when the number of records is requested. The risk of course is that the number of records is incorrect because of database updates between the SELECT COUNT() and the original SELECT.

Re: DBD::SQLite dilemma
by lachoy (Parson) on Aug 11, 2003 at 20:56 UTC

    Coming in late, but another vote for option 3. IIRC, the DBI has for as long as I've been using it (6 or 7 years) said that the number of rows is not returned from a SELECT execute. Anything that relied on this feature was programming by coincidence.

    M-x auto-bs-mode

Re: DBD::SQLite dilemma
by trs80 (Priest) on Aug 11, 2003 at 17:01 UTC
    Option 3

    My reasons:
    1. As others have stated the DBI framework doesn't promise the example you outline to work so you aren't breaking anything that isn't already questionable, if you can even consider that breaking.
    2. I don't advocate that syntax for a way of getting a count result. It to me is one of those "you could do that, but why?" kinda of things.
    As long as you document the change in behavior I think this is the best move at present since you seem to be solving a larger problem then the one you may have created with the changes.

Re: DBD::SQLite dilemma
by bronto (Priest) on Aug 11, 2003 at 09:41 UTC


    4. forking the development in two different modules?


    The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
    --John M. Dlugosz