baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

hi,

i'm seeking for monks wisdom on this one :)

ok, im using DBI for retrieving data from my database tables , and the procedure that i'm using is :

my $datahash = $self->_fetchall(table => 'Table1'); open (NROUT, ">", "./bin/PhyloStr/tmp/tmp.txt") || die "$!"; foreach (@$datahash){ print NROUT ">G$_->{column1}T$_->{column2}\n$_->{column3}\n"; } close NROUT; undef($nrdatahash); # # # am i actually doing anything here? ################## sub for _fetchall: sub _fetchall { my ($self,%arg) = @_; my $stm = "select * from $arg{table}"; my $stmt = $dbh->prepare($stm) || die $dbh->errstr; $stmt->execute || die $dbh->errstr; ; my $array = $stmt->fetchall_arrayref({}); $stmt->finish() || die $dbh->errstr; ; return $array; }
so where is the problem ?

fetchall_arrayref is consuming too much of my ram (1.5 GB) because the tables are pretty big. so my question is should i switch to fetchrow_array(), -> is it going to slow this process a lot or not. or should i retrieve ref , like fetchrow_arrayref().

the reason i'm asking this question and not testing it by myself is because the process takes to long on my PS

thank you !!

Update:

as far as type of databases goes . i'm on sqlite because it has proven to be very practical, small but fast and eays to use db (or should i say db library)

i love Perl and SQLite , what can you do !!

Update:

i hope this thread is still alive

well what i've done is :

actual script (in module one): $self->_connectdb(connection => "./db/db/db.db"); open (NROUT, ">", "./bin/PhyloStr/tmp/NROUTtmp") || die "$! \n"; $self->_doit(statement => 'begin'); $self->_fetchrow_stmt(table => 'TABE'); while (my $row = $self->_return_row()){ print NROUT ">G$row->[0]T$row->[1]\n$row->[2]\n";; } $self->_finish_stmt(); close NROUT; $self->_disconnectdb(); ######################### sub in a module two: my $fetchrow_stmt; sub _fetchrow_stmt { my ($self,%arg) = @_; my $stm = "select * from $arg{table}"; $fetchrow_stmt = $dbh->prepare($stm) || die $dbh->errstr; ; $fetchrow_stmt->execute || die $dbh->errstr; } sub _return_row { my ($self,%arg) =@_; return $fetchrow_stmt->fetchrow_arrayref(); } sub _finish_stmt { my ($self,%arg) = @_; $fetchrow_stmt->finish() || die $dbh->errstr; undef($fetchrow_stmt); }
so the problem still persists , my ram is jumping through the roof , switching form fetchall to fetchrow just decreased the memory used but not as much as i expected

it's like dbi module stores the fetched rows somewhere.

so does anyone know how to loop through table rows without memorizing fetched data.(because i think this is the problem)

Thank you !

Replies are listed 'Best First'.
Re: fetchall_arrayref DBI question ?
by runrig (Abbot) on Mar 16, 2009 at 20:14 UTC
    You could use bind_columns to fetch directly into separate variables, e.g.:
    $sth->execute(); $sth->bind_columns(\my ($col1, $col2, $col3)); while ($sth->fetch()) { print "$col1 $col2 $col3\n"; }
    Or go ahead and use fetchrow_array.
Re: fetchall_arrayref DBI question ?
by szbalint (Friar) on Mar 16, 2009 at 21:37 UTC

    Depending on the database software you're using, you could try using cursors, that's guaranteed to keep the data on the DB server side. For example in Postgresql (DBD::Pg) you'd do something like this:

    $dbh->do("DECLARE csr CURSOR WITH HOLD FOR SELECT * FROM large_table o +rder by id"); while (1) { my $sth = $dbh->prepare("fetch 100000 from csr"); $sth->execute; my $res = $sth->fetchall_arrayref({}); if ($res && @{$res}) { # batch process the 100k rows } else { # error or no more rows exit; } }

    The code above would retrieve your table in 100k row chunks. If you're using Postgres and you need to deal with tables that do not fit in memory, but would like to process things reasonably fast, this is a good way to go.

Re: fetchall_arrayref DBI question ?
by bellaire (Hermit) on Mar 16, 2009 at 20:14 UTC
    It would be better to fetch one row at a time, so long as you are actually processing one row at a time. If you simply fetch one row at a time and build a large array in memory, and then use foreach, you won't be doing yourself any favors. You likely want to use a while loop around your call to fetchrow_arrayref, and operate on each row as it comes out of the database. That way your memory footprint will be minimized.

    As to the speed, if your process is forced to page to disk because you're using huge amounts of RAM, it's likely that simply reducing the memory footprint will result in a significant improvement in speed. If you have 1.5GB of data in the DB, you can expect that perl could use three times that much (as a rough figure) to keep that in perl variables/data structures in memory. At the very least, it is worth it for you to try it for yourself and find out.
Re: fetchall_arrayref DBI question ?
by perrin (Chancellor) on Mar 17, 2009 at 03:05 UTC
    It may not be as simple as calling fetchrow instead of fetchall. Each database and driver behave differently with respect to how they ferry data between the database and your program. You say you're using SQLite, and I don't know what the recipe is for that one to tell it to move one row at a time. Maybe it's the default, although most of them buffer many rows at once for speed. In any case, I think you should check the SQLite docs for information on how to control buffering of query results. If there's a SQLite mailing list, you might ask there, or on the dbi-users list.
Re: fetchall_arrayref DBI question ?
by Herkum (Parson) on Mar 16, 2009 at 20:12 UTC

    If you put a conditional on SQL limiting the rows you are returning you will not dump such a large table and therefore be able to troubleshoot your problem.

Re: fetchall_arrayref DBI question ?
by bradcathey (Prior) on Mar 17, 2009 at 00:09 UTC

    Required reading

    You could basically have:

    sub _fetchall { my ($self,%arg) = @_; my $stm = "select * from $arg{table}"; return $dbh->selectall_arrayref($stmt, {Slice => {}}); }

    No need to prepare or excute. It's all done for you.

    —Brad
    "The important work of moving the world forward does not wait to be done by perfect men." George Eliot
Re: fetchall_arrayref DBI question ?
by locked_user sundialsvc4 (Abbot) on Mar 18, 2009 at 12:47 UTC

    Here is one small wisdom that you must never forget:

    • Memory is virtual.
    • Virtual memory is a disk file, and must therefore be treated accordingly.

    Virtual memory is, of course, a very cleverly buffered “disk file,” such that in small amounts and with good locality-of-reference patterns you can ignore the difference. The buffer-space available is also plentious. But if your app becomes a “thousand-pound elephant” within the virtual-memory space... “you're scroo-o-o-o'd.”

    When designing any database application, you should assume: (a) that the tables are arbitrarily large, and (b) that the database-engine knows what it is doing with regards to properly handling them. Use carefully-designed queries that will enable the database engine to be as specific as possible in what it returns to you. Don't try to buffer those rows further yourself, i.e. “in the name of efficiency,” because it won't be.

    If you find that your application is running too slow, take a very close look at the queries that it is issuing ... in particular, how many. Lots of apps issue millions of queries during their lifetime.

    If there is one, single “parting shot” I would make, it is this ... remember sorting. When you know that a result-set is sorted in a certain way, a change in value(s) between one record and the next (only two records need be compared...) is significant. Sorting is an uncommonly-fast operation, and unexpectedly efficient. And you get it all with just two words:   ORDER BY.