For an input problem, you could just use the input script again. That is what I did. Of course then you are really testing the time to compile code...

In any case there is overhead to entering a block, and to a hash lookup. Therefore I believe that tye's approach really is faster than your loop. But I went and tested it and found that between the three approaches the difference was all in the compilation. So I went for a longer file (a bit over 4 MB) and found that I was a bit under 27 seconds, tye just over 35, and yours 2 min, 22 seconds. I think tye's is faster than yours, and mine was not so bad after all. :-)

That said, perhaps I am not so ashamed to show the sophisticated method of building the RE, that coerces the RE engine into an optimization it should know about but does not (yet):

use strict; my $match = &ret_match_any( "0 OBS", "AT LEAST", "EXTRANEOUS", "CARTESIAN", "CLOSING", "CONVERT", "DIVISION BY ZERO", "DOES NOT EXIST", "DUE TO LOOPING", "END OF MACRO", "ENDING EXECUTION", "ERROR", "ERRORABEND", "ERRORCHECK=STRICT", "EXCEED", "HANGING", "HAS 0 OBSERVATIONS", "ILLEGAL", "INCOMPLETE", "INVALID", "LOST CARD", "MATHEMAT", "MERGE STATEMENT", "MISSING", "MULTIPLE", "NOT FOUND", "NOT RESOLVED", "OBS=0", "REFERENCE", "REPEAT", "SAS CAMPUS DRIVE", "SAS SET OPTION OBS=0", "SAS WENT", "SHIFTED", "STOP", "TOO SMALL", "UNBALANCED", "UNCLOSED", "UNREF", "UNRESOLVED", "WARNING" ); while(<>) { if ($_ =~ $match) { print "line $., file $ARGV, problem $1\n$_\n"; } } # Takes a list of strings and returns an RE that matches any. sub ret_match_any { my $match_str = &trie_strs(map quotemeta, @_); return qr /($match_str)/; } # Takes a list of escaped strings and returns a single string # suitable for building a match in an efficient way. # Works recursively by grouping strings that share one character sub trie_strs { unless (@_) { return (); } my %rest; foreach my $str (@_) { if (length($str)) { my $chr = substr($str, 0, 1); if ("\\" eq $chr) { $chr = substr($str, 0, 2); push @{$rest{$chr}}, substr($str, 2); } else { push @{$rest{$chr}}, substr($str, 1); } } else { $rest{''} = ['']; } } my @to_join; foreach my $chr (keys %rest) { my $list_ref = $rest{$chr}; if (1 < @$list_ref) { push @to_join, $chr . &trie_strs(@$list_ref); } else { push @to_join, $chr . $list_ref->[0]; } } if (1 < @to_join) { return '(?:' . (join '|', @to_join) . ')'; } else { return $to_join[0]; } }

In reply to RE (tilly) 4: SAS log scanner by tilly
in thread SAS log scanner by nop

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.