in reply to term frequency and mutual info
Since that hash will be quite large, use a database to store the hash. A very popular solution for a disk based hash is DBM::Deep, easy to use, fast, well tested.
If the hash fits into memory, you could accumulate the hash first in memory and then store it to disk. If not, initial creation of the hash will take somewhat longer, but not much thanks to disk caches. But it is a price you have to pay only once anyway
After that finding out the lines where 'un' occured is just a simple hash accesses and a split, practically instantuous
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: term frequency and mutual info
by perl_lover_always (Acolyte) on Oct 22, 2010 at 08:26 UTC |