muyprofesional has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks!
I need to write a script for launch SQL's against one flat file with no-only-one-field-separator structure, and not only fixed-width separators, like this:
Jul 12 02:09:22 - TEST: user1, 15 - load | fail | 131
The file could reach 10Gb or moar of data. It must be able to define the regexp to estract the fields (reading from a config file, for example), something like this:
(.{15}) - (.*?): (.*?), (.*?) - (.*?) \| (.*?) \| (.*)
I've been testing somes approachs, but I'm not pretty sure about how to do it the best way (performance, first)
- Parsing and writing to a DBI Driven database (like this: http://perlmonks.com/?node_id=340569). It consumes too much disk space (re-creates a database with the data parsed, and then launch the query)
- Use DBD::RAM, load the database in memory while parsing the file, and then query it. I think this fails cause of in-memory data size. ¿Could be possible stablish some disk-buffer resource while it grows?..
- Use some TEXT::CSV_XS with DBD::CSV approach, or DBD::Anydata too. I think this is the way: not replicating the data, and let the driver launch the query an lead with the memory recourecs (i need to group and order resulsets). But, i haven't found any way to provide DBD::Anydata the regex or fields delimiters options to handle a different file format than CSV, INI, etc. I've done test with CSV files and the performance is tooo low.
I really need a ray on this, any help would be appreciated. Thanks a lot in advance,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: SQL Query a Non-CSV Flat File - Any Suggestions? :S
by Corion (Patriarch) on Jul 15, 2010 at 15:27 UTC | |
by muyprofesional (Initiate) on Jul 15, 2010 at 15:48 UTC | |
|
Re: SQL Query a Non-CSV Flat File - Any Suggestions? :S
by TedPride (Priest) on Jul 15, 2010 at 19:18 UTC | |
by Anonymous Monk on Jul 16, 2010 at 00:40 UTC |