I have to search many files that can be upwards of 1M lines. I need to find about 10 fields through the entire file store their value in a hash.
The method I am using works fine, but I think it is very fragile. If anything would be out of order, my method fails.
What I have done is this:
Store my fields in an array in the order I am guessing they appear
my %SEARCH_FIELDS = ( '1.X' => [qw!RESULT_T LOTID DEVICEID SETUPID STEPID OMARK D +IEPITCH WAFERID SLOTID CENTER DEFECTS!], '1.8' => [qw!LOTID WAFERID RESULT_T CENTER SLOTID DEFECTS D +EVICEID DIEPITCH OMARK SETUPID STEPID!], );
I store my regex in a hash for searching:
my %KLARF_REGEXP = ( '1.X' => { 'LOTID' => qr/LotID "(.+)";/i, 'DEVICEID' => qr/DeviceID "(\w+)";/i, 'STEPID' => qr/StepID "(.+)";/i, 'SLOTID' => qr/Slot (\d+);/i, 'DEFECTS' => qr/DefectList/i, 'RESULT_T' => qr/ResultTimestamp (.+);/i, 'WAFERID' => qr/WaferID "(.+)";/i, 'SETUPID' => qr/SetupID (.+);/i, 'OMARK' => qr/OrientationMarkLocation (.+);/i, 'DIEPITCH' => qr/DiePitch (.+);/i, 'CENTER' => qr/SampleCenterLocation (.+);/i, }, '1.8' => { 'LOTID' => qr/LotRecord "(.+)"/i, 'DEVICEID' => qr/DeviceID 1 \{"(\w+)"\}/i, 'STEPID' => qr/StepID 1 \{"(.+)"\}/i, 'SLOTID' => qr/Field SlotNumber 1 \{(\d+)\}/i, 'DEFECTS' => qr/DefectList/i, 'WAFERID' => qr/WaferRecord "(.+)"/i, 'RESULT_T' => qr/Field ResultTimestamp \d \{(.+)\ +}/i, 'SETUPID' => qr/Field RecipeID 3 \{(.+)\}/i, 'OMARK' => qr/Field OrientationMarkLocation 1 +\{(.+)\}/i, 'DIEPITCH' => qr/Field DiePitch \d \{(.+)\}/i, 'CENTER' => qr/Field SampleCenterLocation \d \{ +(.+)\}/i, }, );
I then shift off the current $search_field and run until I successfully have a match. Upon a successful match, I shift off the next value.
while ( <FILE> ){ if ( $_ =~ $KLARF_REGEXP{$KLARF_VERSION}{$current_state} ){ $summary->{$current_state} = $1 if $1; LogMsg( "Found $current_state $summary->{$current_state}" ) if + $summary->{$current_state}; $current_state = shift @{$SEARCH_FIELDS{$KLARF_VERSION}}; } }
Is there another better faster stronger way to do what I am doing and not have to hard code the search order? I thought about iterating every line over every possibe regex, but I don't know if that is the best method. The machine that runs this process is already heavily utilized, so I am looking for a memory / processor efficient way to do this.

In reply to Searching Array To Hold RegEx Stack Is Order Dependant by ~~David~~

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.