Linicks has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

Been using Perl for a long time, but tonight something has me beat.

I am trying to capture certain IP's from mail log lines, and this all worked well until I changed the code a bit to capture a new match. For some reason, this code here:

#!/usr/bin/perl -w my $line = "May 25 10:44:06 postfix/smtpd[6992]: NOQUEUE: reject: RCPT + from 13-34-221-131.virtuals.cl[131.221.34.13]: 454 4.7.1 <eax_64@yah +oo.com>: Relay access denied; from=<xo@ore.net> to=<eax_64@yahoo.com> + proto=ESMTP helo=<192.168.0.133>"; if ($line =~ /Relay access denied/) { $line =~ s/.*\]: //; #print $line; $line =~ s/.*\[//; $line =~ s/\].*//; chomp($line); print $line; }

matches the second ']:' first, and fubars the whole logic.

I have tried researching this, but got nowhere due to the noise.

Any ideas what is going on?

Thanks,

Nick

P.S. I fixed it up another way, but I am intrigued.

Replies are listed 'Best First'.
Re: Regex substitute matches second match first?
by stevieb (Canon) on May 25, 2016 at 17:11 UTC

    You're being too greedy. Change:

    $line =~ s/.*\]: //;

    ...to:

    $line =~ s/.*?\]: //; # note the ? following .*

    That'll stop as soon as it sees the first ], whereas without the non-greedy quantifier ?, it'll slurp in the entire string until it finds the last ]

    Here's a way that you can do all of your matching and capture the IP on one line:

    if ($line =~ /\[(\d+\.\d+\.\d+\.\d+)\].*Relay access denied/) { my $ip = $1; print "$ip\n"; }
      As a minor nit, you can do away with this $1 stuff and assign directly to $ip like this:
      if (my ($ip) = $line =~ /\[(\d+\.\d+\.\d+\.\d+)\].*Relay access denie +d/) { print "$ip\n"; }

      Actually, this has made me think now I understand what is going on.

      Postfix logs the mavericks between [] brackets. The first set of [] brackets are the session ID. There is only the session ID [] brackets and the maverick's [] brackets on the matches I need to check.

      So all I need to do for ALL the matches is:

      $line =~ /.*[//; gobble all up to the last [ $line =~ /].*//; get rid of the rest after ]

      Thanks!

      Nick

      Heh, blimey, thank you. So /g sometimes isn't needed - I have read about 'greedy' regex before but never encountered it.

      Many thanks for your wisdom!

      Nick

        The /g modifier is for global matches. For instance, if you had numerous IPs on a single line, and you wanted to grab them all. eg: (untested)

        my $line = "1.1.1.1 blah 2.2.2.2"; my @ips = $line =~ /\d+\.\d+\.\d+\.\d+/g;

        Now $ips[0] would be '1.1.1.1' and $ips[1] would contain '2.2.2.2'.

        A couple of docs you can review are perlretut and perlre.

Re: Regex substitute matches second match first?
by AnomalousMonk (Archbishop) on May 25, 2016 at 18:02 UTC

    A couple of other variations if you want to find the first square-bracketed IP before a specific | specific literal trigger phrase:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw(net); ;; my $line = 'xxx[9.9.9.9] virtuals.cl[131.221.34.13]: 454 9.9.9.9: Rel +ay access denied; helo=[9.9.9.9]'; ;; my $sq_ip = qr{ (?<= \[) $RE{net}{IPv4} (?= \]) }xms; my $trigger = 'Relay access denied'; ;; if (my ($denied_ip) = $line =~ m{ ($sq_ip) (?: (?! $sq_ip) .)*? \Q$tr +igger\E }xms) { print qq{matched: '$denied_ip'}; } ;; my $found_denied = my ($ip_denied) = $line =~ m{ ($sq_ip) (?: (?! $sq +_ip) .)*? \Q$trigger\E }xms; print qq{ found: '$ip_denied'} if $found_denied; " matched: '131.221.34.13' found: '131.221.34.13'


    Give a man a fish:  <%-{-{-{-<

Re: Regex substitute matches second match first?
by haukex (Archbishop) on May 25, 2016 at 17:18 UTC

    Hi Nick,

    I believe what's going on is this, from perlre:

    By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ?.

    This works for me: $line =~ s/.*?\]: //;

    And although not necessary in this case, I personally prefer to be a little more explicit and I'd anchor the match: $line =~ s/^.*?\]: //;

    But I'm wondering, if you're just trying to match the IP address enclosed in those brackets, why not do something like what stevieb suggested - or, taking it even further (depending on personal taste this may be a bit overkill):

    use Regexp::Common qw/net/; if ($line =~ /Relay access denied/) { my @ips = $line=~/$RE{net}{IPv4}/g; print "$_\n" for @ips; }

    Hope this helps,
    -- Hauke D

      Hi Hauke D

      Thanks for the reply

      OK, the code:

      use Regexp::Common qw/net/; if ($line =~ /Relay access denied/) { my @ips = $line=~/$RE{net}{IPv4}/g; print "$_\n" for @ips; }

      looks good, but the example I used is not the only scenerio - I capture several matches all covered but one regex - and other lines contain certain IP's that I am not interested in - postfix always puts the maverick ip in [] brackets.

      Thanks for your help - I will look at this further anyway.

      Nick>

Re: Regex substitute matches second match first?
by BillKSmith (Monsignor) on May 25, 2016 at 20:13 UTC
    It may be easier to search for a match with the module Regexp::Common::net rather than eliminating everything else. This would reduce your chance of false matches.
    Bill

      That is good, but as I said only one IP needs to be captured - the maverick in the second lot of []'s from postfix logs.

      nick