I have two main questions. 1) How fast/slow is a hash lookup for a hash that has more than 1 million entries and 2) Is there a better way to store the necessary information (more info on that below) than a hash? Here is the code of interest: (the DBI module is used with MySQL)
my $sth = $dbh->prepare("SELECT qseqid FROM AllJoinRecip"); $sth->execute() or die("execute failed " . $sth->errstr()); my $i = 1; while(my $seq = $sth->fetchrow()) { next if(exists $processed{$seq}); my $out = `./friendoffriend.pl "$seq"`; my @results = split(',', $out); $processed{$_} = 1 foreach (@results); print "Cluster $i: $out\n"; $i++; }
The output of friendoffriend.pl is a comma separated list of strings. Basically, if I've already found the data in a previous run of friendoffriend.pl, I do not want to run friendoffriend.pl on that data. The MySQL table AllJoinRecip contains more than 1 million records. So, I need an efficient way to store and search the data I have already found so that I do not run friendoffriend.pl on that data. As you can see, I have been using a hash then looking up whether the key exists in the hash. Running this program took about 3 hours 30 minutes to process all of the data. friendoffriend.pl runs very quickly, so I'm wondering if using a hash to store the information is slowing down the program. So, will this large hash significantly impact execution time? If so, what are some alternatives that could speed up my execution time? I have MySQL at my disposal, so feel free to offer suggestions that utilize MySQL. I can also modify the AllJoinRecip table as necessary, as this table can very easily be recreated (it takes about 5 minutes to create, though). Any other optimization suggestions are greatly appreciated. Thanks! -gunr

In reply to Efficiency of a Hash with 1 Million Entries by gunr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.