I'm trying to parse some text files that are generated each month and keep a summary in a persistent multi-level hash. These text files contain around 5 GiB of information, resulting in a tied-hash .db file that is expected to be around 12 GiB once all months are loaded. Trouble is, as the MLDBM .db file rounds the 2 GiB mark, I'm getting:
lseek error at /usr/lib/perl5/site_perl/5.8.5/MLDBM.pm line 161, <IN_F
+ILE> line 5038240.
At this point the MLDBM .db file looks like:
-rw-r--r-- 1 gomer user 2147479776 Nov 28 17:28 summaryDatabase.db
In this case, MLDBM was used like:
use MLDBM qw(GDBM_File Data::Dumper);
But I have also tried using Storable as the serializer, as well as the default SDBM_File TIEHASH. All configurations have thus bombed as soon as the resulting .db file crosses the 2GiB mark.
Is there some configuration of TIEHASH and/or serializer that enables one to keep persistent multi-level hashes greater than 2GiB size? (Please don't tell me I should just use DBI. In this situation, that means petitioning for an Oracle installation).
The underlying Fedora Core 3 & Perl versions:
Linux rskass_arc_2 2.6.12-1.1381_FC3smp #1 SMP Fri Oct 21 04:03:26 EDT
+ 2005 i686 i686 i386 GNU/Linux
This is perl, v5.8.5 built for i386-linux-thread-multi