mhearse has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to capture the mail server in a maillog line.
Oct 9 08:50:53 mail_server sendmail[30172]: j99FopoN030172: from=<sup +port@symantec.com>, size=0, class=0, nrcpts=0, proto=ESMTP, daemon=MT +A, relay=[218.1.114.182]
First I tried a more restrictive:
if ($line =~ /relay\=\[?(.*?)\]?/) { $server = $1; }
Then I tried the less restrictive:
if ($line =~ /relay\=(.*?)/) { $server = $1; }
The regex matches FQDNs fine, but not ip addresses. I need to be able to match both FQDNs and IPs. The brackets, server name are present only for IP addresses. Is the problem obvious to anyone? Thanks.

Replies are listed 'Best First'.
Re: Help with pattern matching
by Aristotle (Chancellor) on Oct 09, 2005 at 23:57 UTC

    Your non-greedy quantifiers won’t work like that. In both cases the lazy quantifier is followed by either nothing or only an optional atom. A non-greedy quantifier will try to stop matching as soon as it can; if *? is the last thing in the pattern, it will always successfully match nothingness and then, because nothing follows that forces it to consume more of the input, will be content to stop there. The same goes when it’s followed by a character that is made optional by ? quantifier: the *? quantifier matches nothingness, and the ? quantifier then fails to match, but that is ok, so the match succeeds, leaving you with an empty string as the capture.

    (I can only recommend Mastering Regular Expressions (the owl book) – you don’t understand regular expressions until you've read that book.)

    I don’t know whether you have chopped down your code any, so it might be something else that’s the culprit. You provide almost no example data to look at, so it’s hard to know what’s really going on.

    Makeshifts last the longest.

Re: Help with pattern matching
by GrandFather (Saint) on Oct 10, 2005 at 00:02 UTC
    use warnings; use strict; my $str1 = 'Oct 9 08:50:53 mail_server sendmail[30172]: j99FopoN03017 +2: from=<support@symantec.com>, size=0, class=0, nrcpts=0, proto=ESMT +P, daemon=MTA, relay=[218.1.114.182]'; my $str2 = 'Oct 9 04:20:35 mail_server sendmail[20773]: j975sOqf03231 +2: to=<522-2204194-1-13-51954mpxas@dimexpress.com>, delay=2+05:26:11, + xdelay=00:00:00, mailer=esmtp, pri=19293815, relay=mail.xpress.com.' +; my $match = qr/relay=\[?([^\]]*)/; print "$1\n" if $str1 =~ $match; print "$1\n" if $str2 =~ $match;

    Prints:

    218.1.114.182 mail.xpress.com.

    Perl is Huffman encoded by design.
Re: Help with pattern matching
by pg (Canon) on Oct 09, 2005 at 23:47 UTC
    my $line = 'Oct 9 08:50:53 mail_server sendmail[30172]: j99FopoN03017 +2: from=<support@symantec.com>, size=0, class=0, nrcpts=0, proto=ESMT +P, daemon=MTA, relay=[218.1.114.182]'; if ($line =~ /relay=\[(.*?)\]/) { print $1; }
      That does work, but I need the bracket match to be optional, so that I can match something like this:
      Oct 9 04:20:35 mail_server sendmail[20773]: j975sOqf032312: to=<522-2 +204194-1-13-51954mpxas@dimexpress.com>, delay=2+05:26:11, xdelay=00:0 +0:00, mailer=esmtp, pri=19293815, relay=mail.xpress.com.
        my $line1 = 'Oct 9 08:50:53 mail_server sendmail[30172]: j99FopoN0301 +72: from=<support@symantec.com>, size=0, class=0, nrcpts=0, proto=ESM +TP, daemon=MTA, relay=[218.1.114.182]'; my $line2 = 'Oct 9 04:20:35 mail_server sendmail[20773]: j975sOqf0323 +12: to=<522-2204194-1-13-51954mpxas@dimexpress.com>, delay=2+05:26:11 +, xdelay=00:00:00, mailer=esmtp, pri=19293815, relay=mail.xpress.com' +; if ($line1 =~ /relay=\[?(.*?)\]?$/) { print $1, "\n"; } if ($line2 =~ /relay=\[?(.*?)\]?$/) { print $1, "\n"; }

        This prints:

        218.1.114.182 mail.xpress.com

        Ah. And it didn’t occur to you that people would need example data that shows all the possible scenarios to be able to solve your problem? :-)

        Is the relay= bit always the last part of the line? In that case you could just use

        m{ relay= \[? (.*) \]? }msx

        If not, then it is probably followed by a comma or end of line (again, can’t know without more sample data), so what you need is

        m{ relay= \[? (.*?) \]? (:? \Z | , ) }msx

        Note that while I’m using a lazy quantifier in the second case (which is what is causing the problems in your patterns), it is followed by a non-optional part of the regex, so it will always be forced to consume as much as necessary to make the overall pattern match.

        Makeshifts last the longest.