My first thought is to see if DBD::CSV would be able to handle this. I suspect it could. Then you'd be able to merely issue an SQL statement to get the data you need. This will likely be the biggest bang for the buck - though that probably won't be much bang, it also won't be much buck (cheap to implement).

Once you have that working, the next step may be to simply load this data into a real database (whether sqlite on one end or DB2 or Oracle or whatever at the other end), which should be fast, and then ask the database to return the result for what should be basically the same query. This will take a bit more to set up, so if it saves anything, it will need to be significant enough to overcome the setup cost (copying of data into the database) just to be worth it. This will take longer (setting up a real database server, even if it's just sqlite), so the return on investment may not be as high - depends on how important it is to you for your transform to perform quickly. Even here, there'll be room for tweaking, based on how you set up your indexes, etc. - if you have in-house database experts, you may want their input on this.

Just my two cents :-)


In reply to Re: Searching Huge files by Tanktalus
in thread Searching Huge files by biomonk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.