in reply to Best way to find patterns in csv file?

If your 35,000 sort patterns clump, then you might want to sort them into a tree, so when you sweep through the 1.5 million lines (only once) by walking the tree to find each incremental match, you can hold partial matches, and immediately discard those that don't meet the essential criteria.

Just a thought,
-v
"Perl. There is no substitute."
  • Comment on Re: Best way to find patterns in csv file?

Replies are listed 'Best First'.
Re^2: Best way to find patterns in csv file?
by punch_card_don (Curate) on Nov 30, 2004 at 20:56 UTC
    Ya, I've been thinking about groupings. Given the patterns
    datum_1 = x and datum_2 = y and datum_3 = z datum_1 = x and datum_2 = a and datum_3 = d datum_1 = x and datum_20 = g and datum_13 = j
    first find alll lines for which datum_1 = x, the just work on those for ll other patterns that begin with datum_1 = x. Doing so, I figure I can reduce the number of line sans from 40-billion to ~400-million given the average number of repeat datums in patterns. A nice reduction, but still interested in better. I'll probably end up just putting it all in a database and let the machine do the heavy lifting...