in reply to performance issues

You haven't told us the most important thing: what do you need the data for? Depending on the usage there can be many ways to speed up the program: if you do direct lookups, a hash might be a good ideas. There are trees for other forms of lookups, and a database with index can also help - if you don't change that file too often.

Replies are listed 'Best First'.
Re^2: performance issues
by perlcat (Novice) on Jan 27, 2009 at 19:04 UTC

    I use the data to generate ngrams. Basically, whenever I find a given string in the first part of the line (before the \t), I put the second into an array. After having gone through all the lines, I generate ngrams for the array.

      Is this something where using DBM::Deep to store the resulting data structure for reader use would make sense?

      --MidLifeXis