Hi monks!

I need to write a script for launch SQL's against one flat file with no-only-one-field-separator structure, and not only fixed-width separators, like this:

Jul 12 02:09:22 - TEST: user1, 15 - load | fail | 131

The file could reach 10Gb or moar of data. It must be able to define the regexp to estract the fields (reading from a config file, for example), something like this:

(.{15}) - (.*?): (.*?), (.*?) - (.*?) \| (.*?) \| (.*)

I've been testing somes approachs, but I'm not pretty sure about how to do it the best way (performance, first)

- Parsing and writing to a DBI Driven database (like this: http://perlmonks.com/?node_id=340569). It consumes too much disk space (re-creates a database with the data parsed, and then launch the query)

- Use DBD::RAM, load the database in memory while parsing the file, and then query it. I think this fails cause of in-memory data size. ¿Could be possible stablish some disk-buffer resource while it grows?..

- Use some TEXT::CSV_XS with DBD::CSV approach, or DBD::Anydata too. I think this is the way: not replicating the data, and let the driver launch the query an lead with the memory recourecs (i need to group and order resulsets). But, i haven't found any way to provide DBD::Anydata the regex or fields delimiters options to handle a different file format than CSV, INI, etc. I've done test with CSV files and the performance is tooo low.

I really need a ray on this, any help would be appreciated. Thanks a lot in advance,

In reply to SQL Query a Non-CSV Flat File - Any Suggestions? :S by muyprofesional

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.