Somewhat OT answer

You need to parse "a bunch" of data. There seems to be real-time value in your data. Is it possible that a solution with a quick execution time is needed? This idea has led me to an off-topic answer, since it is really about how to make a quick program of the type that you describe.

One of the my favorite things about perl is the speed of the regular expression engine. It can parse lines very quickly by anchoring a match at the beginning of a line. The rest of the line can be parsed using a fast regular expression.

if (/^G017RATEBRKRL,([^,]+),([^,]+),([^,]+),(.*)/) { $col[3]=$1; $col[5]=$2; $col[1]=$3; $col[4]=$4; } elsif (/^G017CP111 D,([^,]+),([^,]+),([^,]+),(.*)/) { # etc... }
The negated character classes run quickly because they only need to look for commas.

Another perl speedup has to do with minimizing the number of copy operations needed to load a database with DBI. It should be possible to go from $1, $2, etc, into a data structure that can be directly loaded into the database, without being copied again.

It is great to be write programs that have wonderful abstractions in them. Sometimes it is even better to write programs that are wickedly fast.

It should work perfectly the first time! - toma


In reply to Re: A Slough of ParseRecDescent Woes by toma
in thread A Slough of ParseRecDescent Woes by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.