Dear Colleagues, I am parsing a large undirected graph, which is stored in a plain text file larger than 4GB. In this file, each line containing two nodes and their associated information. For example:
node1 node2 weight status 23 34 -897 1 24 46 -10 0
It is too large to fill in RAM, so I used the following code to tie it with a hash:
use BerkeleyDB; use MLDBM qw(BerkeleyDB::Hash Storable); my %hash; my @seqRelation; $hash{'key'} = \@seqRelation; my $dbFile = '/tmp/relation.db'; tie %hash, 'MLDBM', -Filename => $dbFile, -Flags => DB_CREATE or die $ +!; # read into the file content my $inFile = shift; open(IN,"zcat $inFile | ") or die "Can not open $inFile:$!"; while(<IN>) { chomp; my @fields = split "\t"; my ($seq1,$seq2,$w,$s) = @fields; push @{$seqRelation[$seq1]}, join(',', $seq2, $w, $s); # also store it in the other direction push @{$seqRelation[$seq2]}, join(',', $seq1, $w, $s); } ......
Here I used @seqRelation to store the data, and then assign it to the hash with the only key 'key', because BerkeleyDB::Hash can only accept hash. The above code is runnable without warning. However, I am confused by the tied file size of /tmp/relation.db. When I checked it after the program finished. It is only 48K, but original file is >4GB. It is unbelievable (but maybe I am wrong because I am not familiar the mechanism of BerkeleyDB). Is this correct or normal? I expected a much larger file size for /tmp/relation.db. I have no idea why it is so small. I am worrying whether some data was missed when tying. By the way, I also need change the status values in my program. Any help or idea is appreciated. Thank you in advance! Best regards! Zhenguo

In reply to Large files tied by BerkeleyDB with MLDBM by fortunezhang

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.