Magnanimous monks,

I'm faced with two problems that I'm hoping your insight can help me resolve.

The first problem, and I have some solutions that help work around it, is the fact that selecting all records from a database table can result in high, and possibly unacceptable memory usage.

Consider this code:

my $sth = $dbh->prepare("SELECT * FROM foo"); $sth->execute;

Upon execute(), DBI requests a result set from the database (in my case, MySQL). After that point, the resulting data is held in memory, to be later accessed with calls to fetchrow(), and the like. In my experience, if the table has a very large number of records, this simple bit of code can result in very high memory usage.

One solution is to impose a maximum limit on the number of records selected at a time, essentially paginating the results. I created a module awhile back that does just this. You feed it a query and a maximum record limit (ie. 25,000) and it returns something akin to a statement handle that can be used to fetch records. Internally, it chops up the query and appends a LIMIT/OFFSET clause and only selects from the database as needed.

Another solution is to only select the record ids, and then individually select the records by id. Granted, this would be much slower. I'd be trading memory usage for CPU time, but that seems to be a likely tradeoff in any scenario.

The real problem for me, and the one that I'm more curious about, is finding a way to abstract this problem away, such that I, and the programmers that I work with, will not need to worry about bringing a server to its knees by executing a query that results in unacceptable memory usage. Ideally, the solution would be in place at a reasonably low level such that the programmers could take advantage of it while performing a manual query, or performing a search with our homebrew ORM.

Speaking of ORMs, are there any out there that handle this implicitly? We would be willing to switch to say, Class::DBI, DBIx::Class, Rose::DB::Object, SomethingElse, if it had something in place to help control memory usage.

Part of me thinks this is a non-issue and I should just write code that suits the application, but another part of me feels that if I'm writing a script that needs to do something with all records in a given table, I shouldn't need to alter my strategy based on the number of records in the table. Am I way off-base with this? Am I being too idealistic? Is this problem in my head? :)


In reply to Of large database tables and high memory usage. by mseabrook

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.