Re: Unique uniq implemenation

Would've been nice of me to elaborate, sorry :-)

OK - Now I have about 10 meg, maybe more. I don't know
what I will have down the road. Maybe 100 meg next time
around?

And the data - it's a single number (a count) plus a
text descriptor. Descriptor may have whitespace, but
I can count on only one space between number and descriptor

Now, what I need to do is take all the lines that match
descriptor, add their counts, and print them.

i.e.,

my ($freq, $word) = split;
$freq_hash{$word} += $freq;
[download]

does the job in a rather unscalable way

Make sense?

Thanks again,
felonious
--

Comment on Re: Unique uniq implemenation Download Code

Replies are listed 'Best First'.
Re: Re: Unique uniq implemenation by simon.proctor (Vicar) on Mar 07, 2002 at 00:02 UTC
If you are concerned about size and memory consumption then I'd suggest looking at DB_File, MLDBM and Storable. You can then tie your hash to the disk. However, you have shifted your memory problems to disk problems. But generally speaking HDD space is cheaper than memory space. However, as has been mentioned above, Perl can handle very large sets of data quite easily.	[reply]
Re: Re: Re: Unique uniq implemenation by feloniousMonk (Pilgrim) on Mar 07, 2002 at 14:31 UTC
I do that for another implementation but it seems to be very slow. I want the world! But actually, I've decided to stick with the hash until the data gets too big, then I'll probably do a DB_Hash. Thanks for the help everyone! -felonious --	[reply]