comment on

For an input problem, you could just use the input script again. That is what I did. Of course then you are really testing the time to compile code...

In any case there is overhead to entering a block, and to a hash lookup. Therefore I believe that tye's approach really is faster than your loop. But I went and tested it and found that between the three approaches the difference was all in the compilation. So I went for a longer file (a bit over 4 MB) and found that I was a bit under 27 seconds, tye just over 35, and yours 2 min, 22 seconds. I think tye's is faster than yours, and mine was not so bad after all. :-)

That said, perhaps I am not so ashamed to show the sophisticated method of building the RE, that coerces the RE engine into an optimization it should know about but does not (yet):

use strict;

my $match = &ret_match_any(
  "0  OBS",
  "AT LEAST",
  "EXTRANEOUS",
  "CARTESIAN",
  "CLOSING",
  "CONVERT",
  "DIVISION BY ZERO",
  "DOES NOT EXIST",
  "DUE TO LOOPING",
  "END OF MACRO",
  "ENDING EXECUTION",
  "ERROR",
  "ERRORABEND",
  "ERRORCHECK=STRICT",
  "EXCEED",
  "HANGING",
  "HAS 0 OBSERVATIONS",
  "ILLEGAL",
  "INCOMPLETE",
  "INVALID",
  "LOST CARD",
  "MATHEMAT",
  "MERGE STATEMENT",
  "MISSING",
  "MULTIPLE",
  "NOT FOUND",
  "NOT RESOLVED",
  "OBS=0",
  "REFERENCE",
  "REPEAT",
  "SAS CAMPUS DRIVE",
  "SAS SET OPTION OBS=0",
  "SAS WENT",
  "SHIFTED",
  "STOP",
  "TOO SMALL",
  "UNBALANCED",
  "UNCLOSED",
  "UNREF",
  "UNRESOLVED",
  "WARNING"
);

while(<>) {                        
  if ($_ =~ $match) {
    print "line $., file $ARGV, problem $1\n$_\n";
  }
}


# Takes a list of strings and returns an RE that matches any.
sub ret_match_any {
  my $match_str = &trie_strs(map quotemeta, @_);
  return qr /($match_str)/;
}

# Takes a list of escaped strings and returns a single string
# suitable for building a match in an efficient way.
# Works recursively by grouping strings that share one character
sub trie_strs {
  unless (@_) {
    return ();
  }
  my %rest;
  foreach my $str (@_) {
    if (length($str)) {
      my $chr = substr($str, 0, 1);
      if ("\\" eq $chr) {
        $chr = substr($str, 0, 2);
        push @{$rest{$chr}}, substr($str, 2);
      }
      else {
        push @{$rest{$chr}}, substr($str, 1);
      }
    }
    else {
      $rest{''} = [''];
    }
  }
  my @to_join;
  foreach my $chr (keys %rest) {
    my $list_ref = $rest{$chr};
    if (1 < @$list_ref) {
      push @to_join, $chr . &trie_strs(@$list_ref);
    }
    else {
      push @to_join, $chr . $list_ref->[0];
    }
  }
  if (1 < @to_join) {
    return '(?:' . (join '|', @to_join) . ')';
  }
  else {
    return $to_join[0];
  }     
}
[download]

In reply to RE (tilly) 4: SAS log scanner by tilly
in thread SAS log scanner by nop

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.