in reply to Re: How do I measure my bottle ?
in thread How do I measure my bottle ?

Thanks for your rapid reply, my test script is listed as following:

script 1:

while(my $l=<IN>){ }

script 2:

while(my $l=<IN>){ my $id=substr($l,0, 33); $hash{$id}=1; }

script 1 takes me around 15 secs where as script2 takes me 20 secs.

Replies are listed 'Best First'.
Re^3: How do I measure my bottle ?
by RazorbladeBidet (Friar) on Mar 25, 2005 at 13:50 UTC
    Then your hash insert is only taking 5 seconds (all other things being equal).

    There is the memory consideration, also (as stated below).

    Is this 20M records totalling 1GB or 1 TeraByte? (You mention 1,000 GB in your original post).

    Is there a reason you are using a hash? (in your example it looks like you could use an array, but I understand it is merely a "test")

    If you have many files (and it sounds like you do) - you could slurp in the entire file (one file at a time) and do the inserts, which will increase your memory usage but decrease CPU time. See File::Slurp
    --------------
    "But what of all those sweet words you spoke in private?"
    "Oh that's just what we call pillow talk, baby, that's all."
Re^3: How do I measure my bottle ?
by tlm (Prior) on Mar 25, 2005 at 14:57 UTC

    If you want a handle on the cost of hash inserts, you may as well make the comparison more precise by making script 1 do something like:

    while(my $l=<IN>){ my $id=substr($l,0,33); $hash=1; }
    (Assuming of course, the compiler doesn't optimize any of this away.)

    the lowliest monk