jdhawke has asked for the wisdom of the Perl Monks concerning the following question:

I am currently working on a simple log parser for my self (yes I know there are many perfectly good ones out there but I am trying to learn some perl by doing this).
My question is this, is it possible to write a regexp that can get its match via an external variable so that I could use a configuration file to define the match based on the particular log structure on a given machine.
Right now I have this match hard coded into the script:

m/Shorewall:.*:(?:DROP|REJECT|ACCEPT).*SRC=(\d+\.\d+\.\d+\.\d+).*PROTO=(TCP|UDP|ICMP)/

But would like to be able to have it read the match structure from the configuration file. Any leads would be helpful.

--thanks

Replies are listed 'Best First'.
Re: Configurable Matches
by Mr. Muskrat (Canon) on Jun 03, 2004 at 04:17 UTC

    Yes, it is possible. You are looking for qr in perlop (under Regexp Quote-Like Characters).

    #!/usr/bin/perl use strict; use warnings; my $text = 'Mary had a little lamb.'; # get your text from wherever my $regex = 'it'; # get your regex info from where +ver my $reg = qr/$regex/; # compile it print $regex if $text =~ $reg; # check for a match

      Thanks alot, this seems to be exactly what I am looking for.
Re: Configurable Matches
by Stevie-O (Friar) on Jun 03, 2004 at 04:17 UTC
    It's a lot easier than you might think... m// interpolates.
    #!/usr/bin/perl -l chomp($tmp = <DATA>); $foo = 'the rain in spain'; print "match" if $foo =~ /$tmp/; __END__ __DATA__ ain$
    Change the line after __DATA__ to change the regex matched against.
    --Stevie-O
    $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc
      Hmmm, this makes it even easier, any pros/cons of doing it this way vs the qr// method that Mr.Muskrat demonstrated above?
        Well, you hit the pro on the head... it's much easier to interpolate variables into m//.

        There is a major con, though. If you plan on using this regular expression in a loop, it will need to be recompiled every time it is used. With qr//, you will only compile once... You may want to benchmark them to find out the difference:

        #!/usr/bin/perl -w use Benchmark; my $txt = 'find this regex'; my $str = 'regex?'; my $rgx = qr/$str/i; timethese ( 1000000, { match => sub{ $txt =~ m/$str/i }, qreg => sub{ $txt =~ $rgx }, }, );

        Gives me the following:

        % test.pl Benchmark: timing 1000000 iterations of match, qreg... match: 1 wallclock secs ( 0.82 usr + 0.00 sys = 0.82 CPU) @ 12 +19512.20/s (n=1000000) qreg: 0 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) @ 13 +69863.01/s (n=1000000)

        As you can see, for a regex this simple, it doesn't matter too much... but more complicated regexs wll show a bigger difference.

Re: Configurable Matches
by hsinclai (Deacon) on Jun 03, 2004 at 04:25 UTC
    It seems as if your regex matches entire lines really - is that to say Shorewall loglines are mixed into log files, along with various other output?

    You can certainly set up the content to be matched using a variable from an external file, but first it might be really useful to get the logs in order carefully.

    If they are mixed logs, it's much more (needless) work.. better separate them out - e.g. each virtual site has its own logs, iptables logs to one file, auth events to another, etc etc

    Sorry if I misunderstood what you're trying to acheive..
      No problem, yes I am matching entire lines from my syslog, but am using the capturing parens to pull out a synopsis of the data.
      So instead of:
      Jun 2 03:09:40 localhost kernel: Shorewall:net2all:DROP:IN=eth0 OUT= +MAC=00:50:04:70:8c:ba:00:01:96:0d:03:70:08:00 SRC=211.199.195.208 DST +=24.xx.xx.xxx LEN=48 TOS=0x00 PREC=0x00 TTL=97 ID=35909 DF PROTO=TCP +SPT=3568 DPT=1025 WINDOW=16384 RES=0x00 SYN URGP=0

      I will have this:
      IP Proto Sport Dport Count 211.199.195.208 TCP 3568 1025 1
      And like I said, this is more for me to learn perl than any other reason. :)
Re: Configurable Matches
by spikey_wan (Scribe) on Jun 03, 2004 at 12:48 UTC
    Dead easy!

    I've got a log file parser, and I did it like this:

    my $TagStart = "tag start.*swfm l . info"; my $TagStop = "tag stop"; ... insert() if ($curnt =~ /$TagStart/i); ... last if ($curnt =~ /$TagStop/i);
    You can use all the standard pattern matching thingies inside the quotes for the variables, and then slap the vars in between the slashes, and it all works fine.

    Spike.

Re: Configurable Matches
by Anonymous Monk on Jun 03, 2004 at 04:43 UTC
    (yes I know there are many perfectly good ones out there but I am trying to learn some perl by doing this).
    You know, you can learn perl by reading the source of one of those perfectly good log parsers written in perl.
      True, but I am a person who learns much better by doing than reading. When I am looking at a new language and trying to figure out someone else's code I generally am not able to learn as much as when I start from "Hello World" and build up to a similar piece of software on my own.