Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Creating Dictionaries

by salva (Canon)
on Dec 16, 2005 at 14:19 UTC ( #517243=note: print w/replies, xml ) Need Help??

in reply to Creating Dictionaries

IMHO, the problem is not the input being sorted but all the entries being unique and causing the hash to grow too much and eating all the memory. On common text files, most words are repetitions of already found words and so, they don't make the hash grow.

There are several ways to solve that problem, for instance, you can try using an on disk tree with DB_File.

Another way is to flush all the words found to temporal files on disk everytime their number goes over some limit, and at the end, perform a merge sort and eliminate duplicates.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://517243]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2023-05-31 10:41 GMT
Find Nodes?
    Voting Booth?

    No recent polls found