sotona has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow Monks,

I have multi-line text which a receive in response to HTTP request.
This text contains chunks like
   text: XYZ
where XYZ is an integer with exactly three decimal digits.
How do I match only last occurrence of such chunk?
Of course XYZ can be different in each occurrence


Sapienti sat.

Replies are listed 'Best First'.
Re: Match only last occurrence
by davido (Cardinal) on May 31, 2016 at 15:09 UTC

    The page is probably not huge. Just reverse it and let the regex find the first occurrence.

    print reverse($1) . "\n" if reverse($string) =~ m/(\d{3} :txet)/;

    If you wish to prevent strings of digits exceeding a length of 3, you can add a lookbehind:

    m/(?<!\d)(\d{3} :txet)/

    But now perhaps it's become ugly enough that reverse hasn't bought clarity.

    Update: I can't recall where I first saw this technique. I thought maybe Mastering Regular Expressions, but a quick search of that text didn't turn up anything relevant. However, the notion of reversing a string as a quick and easy means of finding the last occurrence of some match in the string is discussed here: http://www.perl.com/pub/2001/05/01/expressions.html.


    Dave

      I can't recall where I first saw this technique.

      Perhaps at sexeger? That was posted on 2000-09-20, which predates your link by most of a year.

      - tye        

        That must be it. :)


        Dave

Re: Match only last occurrence
by stevieb (Canon) on May 31, 2016 at 14:45 UTC

    Use a negative zero-width lookahead assertion:

    perl -E '$x="test: 111\ntest: 222 test: 333"; $x=~/test:\s+(\d{3})(?!t +est:\s+\d{3})$/s; say $1'; 333

    What the

    /test:\s+(\d{3})(?!test:\s+\d{3})$/s

    regex does is looks for test:, and captures the following three digits, so long as there's not another test: NNN anywhere else after it (meaning the last one in the string). The /s modifier allows you to search across newlines. I've used \s+ instead of a literal space just to ensure that it'll match any type of whitespace (tab, multiple consecutive spaces etc).

      (Quoting part of the original code from before a silent update was done. Note that the correction that follows still applies to the 2nd version of the code but, of course, may not apply to some future version of the code if more updates are made.)

      /(\d{3})(?!\d{3})$/s

      Well, you picked an example where that works but not for the reasons that you think. Your example only works because you picked an example string that matches the much simpler:

      /(\d{3})$/s

      A regex that is actually functionally identical to your regex.

      What you wanted was more like:

      /(\d{3})(?!.*\d{3})/s

      (Update: Removed the '$' from before the last closing paren.)

      But I'd still prefer the ( ... )[-1] approach as complex regexes often lead to mistakes, as we have seen twicethrice already in this thread.

      - tye        

Re: Match only last occurrence
by Anonymous Monk on May 31, 2016 at 14:55 UTC

    Skip over everything with .*, then backtrack to the last occurence.

    /\A.*^text: \d\d\d/ms
      Great response! I did not notice it when I wrote and tested much the same thing.
      use strict; use warnings; use Test::Simple tests => 1; my $multi_line_string = "yadda text: 123 yadda yadda\n" . " text: 456 yadda\n" ; $multi_line_string =~ /\A .* (text\:\s\d{3}) /xms; my $expected = 'text: 456'; my $match = $1; ok( $match eq $expected, 'Match last occurrence'); OUTPUT: 1..1 ok 1 - Match last occurrence
      Bill
Re: Match only last occurrence
by hippo (Archbishop) on May 31, 2016 at 14:29 UTC
    /text: \d{3}.*?$/sa;

    Update: See tye's important observation and correction in the reply. You could instead anchor the regex at the start to have it match the last. eg:

    /^.*text: \d{3}/sa;

      Actually, that doesn't work. The regex engine finds the first match of "text: \d{3}" and then tries to match ".*?" anti-greedily but still matches .*? before it would backtrack and try matching the second match. Anti-greediness doesn't trump "left-most first". It only trumps "longest first".

      I think the simplest approach is:

      my $match = ( /text: ([0-9]{3})/g )[-1];

      But if you want to have the regex enforce the "last" part, then you have to get more complicated:

      my( $match ) = /text: ([0-9]{3})(?:(?!text: [0-9]{3}).)*$/s;

      - tye        

        That's it! I didn't figure out I have to repeat the pattern to match exactly the last occurrence.
        Thanks a lot!
        Sapienti sat.
        Anti-greediness doesn't trump "left-most first". It only trumps "longest first".

        How right you are - thanks for the spot and the correction.

      matches the first occurence
      Sapienti sat.