Magnanimous monks,
I'm faced with two problems that I'm hoping your insight can help me resolve.
The first problem, and I have some solutions that help work around it, is the fact that selecting all records from a database table can result in high, and possibly unacceptable memory usage.
Consider this code:
my $sth = $dbh->prepare("SELECT * FROM foo"); $sth->execute;
Upon execute(), DBI requests a result set from the database (in my case, MySQL). After that point, the resulting data is held in memory, to be later accessed with calls to fetchrow(), and the like. In my experience, if the table has a very large number of records, this simple bit of code can result in very high memory usage.
One solution is to impose a maximum limit on the number of records selected at a time, essentially paginating the results. I created a module awhile back that does just this. You feed it a query and a maximum record limit (ie. 25,000) and it returns something akin to a statement handle that can be used to fetch records. Internally, it chops up the query and appends a LIMIT/OFFSET clause and only selects from the database as needed.
Another solution is to only select the record ids, and then individually select the records by id. Granted, this would be much slower. I'd be trading memory usage for CPU time, but that seems to be a likely tradeoff in any scenario.
The real problem for me, and the one that I'm more curious about, is finding a way to abstract this problem away, such that I, and the programmers that I work with, will not need to worry about bringing a server to its knees by executing a query that results in unacceptable memory usage. Ideally, the solution would be in place at a reasonably low level such that the programmers could take advantage of it while performing a manual query, or performing a search with our homebrew ORM.
Speaking of ORMs, are there any out there that handle this implicitly? We would be willing to switch to say, Class::DBI, DBIx::Class, Rose::DB::Object, SomethingElse, if it had something in place to help control memory usage.
Part of me thinks this is a non-issue and I should just write code that suits the application, but another part of me feels that if I'm writing a script that needs to do something with all records in a given table, I shouldn't need to alter my strategy based on the number of records in the table. Am I way off-base with this? Am I being too idealistic? Is this problem in my head? :)
In reply to Of large database tables and high memory usage. by mseabrook
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |