in reply to memory problems parsing large csv file

It's a convoluted algorithm, that's for sure. And what else is sure is that someone has probably solved it already. There is, as mentioned, modules that can sort using the disk (Sort::External), though that says that it keeps everything in a scalar, and you really want your rows parsed - no problem, you can reparse them. Er, that seems like a waste ;-)

Personally, my preferred manner of accessing CSV files is via DBD::CSV. In this case, I'd just use "SELECT * FROM darkint ORDER BY COL9, COL32, COL33" (with some other minor setup required - making "darkint" point to "darkint.csv", setting up the column names if they aren't in the file's first line, and possibly setting the EOL character). I think those guys have solved this problem. If not, I'd then use it as an excuse to populate a real DBMS, and, with little change, be able to continue using DBI to get my data ;-)

  • Comment on Re: memory problems parsing large csv file

Replies are listed 'Best First'.
Re^2: memory problems parsing large csv file
by Tux (Canon) on Aug 26, 2009 at 07:31 UTC

    DBD::CSV will read the complete CSV into memory, so it won't solve memory hog problems.

    As a side note, in the upcoming version of DBD::CSV you don't need that (sym)link anymore:

    my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_ext => ".csv/r", f_dir => ".", f_schema => undef });

    will only open the files with a .csv extension and have it removed from the table name.


    Enjoy, Have FUN! H.Merijn