michthedrizzard has asked for the wisdom of the Perl Monks concerning the following question:

Writing a small CGI program I am telling the script to read a 'flatline' text file DB into a hash

e.g.
while( <FILE> )
chomp;
/^(\w+)\|/;
$hash{$1)='blah';
}

My program runs fine when there are a few entries in the hash, but the browser using it just sits on the page trying to access the script when the file/hash is very large (200ish entries?) Is ther a workaround for this, or does anyone know why this happens.


Appreciate your help,
Mich

Replies are listed 'Best First'.
Re: Immense hashes
by BrowserUk (Patriarch) on May 27, 2004 at 23:51 UTC

    Your sample code isn't very convincing. Why would you be setting all the elements of your hash to a value of 'blah'?

    Also, 200ish elements in a hash is not even close to immense. In fact it's tiny. Creating hashes of 100s of 1000s of keys is routine if you have enough memory. I just ran the following code which creates a hash with 100_000 keys read from a file and extracted using a regex and it took around 2 seconds to run.

    open I, '<', 'data/junk'; m[^(\d+)$] and $h{$1} = undef while <I>; print scalar keys %h; 100000

    Essentially, if you are experiencing long delays in creating your hash of 200 keys, then there is some cause that is not identified in your code. Posting the actual code that is giving the problem (or a salient subset of it) is likely to get a much better answer.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re: Immense hashes
by TomDLux (Vicar) on May 28, 2004 at 01:40 UTC

    This is impossible code, because you have no opening curly brace; it will not compile, so it won't take any time at all.

    How many lines is your files? How many bytes is it?

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Immense hashes
by tkil (Monk) on May 28, 2004 at 07:34 UTC

    I agree with most of the other posters so far: this is very incomplete (and thus very confusing) code.

    I think you want to read a flat file in, use the first word on each line as a key, then assign something as the value in a hash. If so, maybe you're looking for something like this?

    my %hash; while ( <FILE> ) { my ( $key ) = ( m/^(\w+)/ ) or next; $hash{$key} = 'whatever'; }

    If you are doing something with the rest of the line, maybe you want to do a limited split on whitespace instead ... ah, no, you want a literal vertical bar:

    my %hash; while ( <FILE> ) { my ( $key, $value ) = split /\|/, $_, 2; next unless $key; $hash{$key} = $value; }

    If neither of the above are what you're trying to do, consider restating your original question. (For the record, I consider hashes with hundreds of thousands of elements to be "immense" on current machines — a few hundred should be trivial.)

    Finally, never ever ever use $1 and friends without first checking for the success of a match. In your code, you do:

    /^(\w+)\|/; $hash{$1)='blah';

    It is entirely possible that the regex might not match, in which case $1 will be undef. Make the assignment conditional:

    if ( /^(\w+)\|/ ) { $hash{$1} = 'blah' }
      It is entirely possible that the regex might not match, in which case $1 will be undef.

      Or worse, the $1 variable will have the value of the previous time it matched and it will overwrite the existing entry in the hash. In this case it wil overwrite it with the same value it already had as all values of the hash are set to "blah" (which is utterly strange).

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        Or worse, the $1 variable will have the value of the previous time it matched and it will overwrite the existing entry in the hash.

        Hm. For some reason, I was under the impression that $1 and friends would be undef after any unsuccessful match. Oh, wow, it doesn't; gross:

        $ perl -lwe '$_="foo"; /(\w+)/ && print "match: $1"; /(\W+)/ || print "no match: $1";' match: foo no match: foo
        In this case it wil overwrite it with the same value it already had as all values of the hash are set to "blah" (which is utterly strange).

        Well, I just assumed that the original poster was using that "blah" as a placeholder for something else, for the purposes of posting. The presence of quite a few other syntactic errors makes it clear that this was not the code that actually ran...