punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:
I have a text file that looks like this:
record_id, datum_1, datum_2, ... , datum_30;
where record_id is integer, and 99% of datums are integers, although about 1% may be text or decimals. Datums can be null. There are 1.5-million records.
Then I have a collection of about 35,000 patterns I have to search for. That is, find all records that have, for example, datum_1 = x and datum_8 = y and datum_20 = z, regardless of what might be in other columns. A single record may contain several patterns, so each line has to be searched for each pattern
I realize this is just mimicing the functionality of a database, (select record_id from theTable where datum_1 = x and datum_8 = y and datum_20 = z) but I was wondering if there's a very efficient way of doing this directly on the file without setting up a database and without scanning 1.5-million lines 35,000 times (I wonder how long 50-billion line scans would take?). I've thought about this most of the day and come up with nothing promising....
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Best way to find patterns in csv file?
by tmoertel (Chaplain) on Nov 30, 2004 at 23:54 UTC | |
|
Re: Best way to find patterns in csv file?
by stajich (Chaplain) on Nov 30, 2004 at 21:06 UTC | |
by Thilosophy (Curate) on Dec 01, 2004 at 10:55 UTC | |
|
Re: Best way to find patterns in csv file?
by Velaki (Chaplain) on Nov 30, 2004 at 20:42 UTC | |
by punch_card_don (Curate) on Nov 30, 2004 at 20:56 UTC | |
|
Re: Best way to find patterns in csv file?
by Jenda (Abbot) on Nov 30, 2004 at 21:14 UTC | |
|
Re: Best way to find patterns in csv file?
by Yendor (Pilgrim) on Nov 30, 2004 at 20:39 UTC | |
|
Re: Best way to find patterns in csv file?
by jZed (Prior) on Nov 30, 2004 at 23:00 UTC | |
|
Re: Best way to find patterns in csv file?
by tall_man (Parson) on Dec 01, 2004 at 05:50 UTC | |
|
Re: Best way to find patterns in csv file?
by punch_card_don (Curate) on Dec 01, 2004 at 15:39 UTC |