in reply to Unique uniq implemenation

Would've been nice of me to elaborate, sorry :-)

OK - Now I have about 10 meg, maybe more. I don't know
what I will have down the road. Maybe 100 meg next time
around?

And the data - it's a single number (a count) plus a
text descriptor. Descriptor may have whitespace, but
I can count on only one space between number and descriptor

Now, what I need to do is take all the lines that match
descriptor, add their counts, and print them.

i.e.,
my ($freq, $word) = split; $freq_hash{$word} += $freq;

does the job in a rather unscalable way

Make sense?

Thanks again,
felonious
--

Replies are listed 'Best First'.
Re: Re: Unique uniq implemenation
by simon.proctor (Vicar) on Mar 07, 2002 at 00:02 UTC
    If you are concerned about size and memory consumption then I'd suggest looking at DB_File, MLDBM and Storable. You can then tie your hash to the disk. However, you have shifted your memory problems to disk problems. But generally speaking HDD space is cheaper than memory space.

    However, as has been mentioned above, Perl can handle very large sets of data quite easily.
      I do that for another implementation but it seems to be
      very slow. I want the world! But actually, I've decided
      to stick with the hash until the data gets
      too big, then I'll probably do a DB_Hash.

      Thanks for the help everyone!
      -felonious
      --