in reply to Re^2: why my reg ex matches greedy?
in thread why my reg ex matches greedy?

Well, clearly I'm an idiot. I thought I'd drop a mention of YAPE::Regex::Explain to show how to figure out what a regular expression is saying, so I could point out the "invented" bit. But I find that the bit I thought was "invented" seems fine. Sorry about that. When I run:

use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/(.*)(_\d{11,}?.*)(\.\w+)/)->explain +();

I get:

$ perl xxxyyyzzz.pl The regular expression: (?-imsx:(.*)(_\d{11,}?.*)(\.\w+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- _ '_' ---------------------------------------------------------------------- \d{11,}? digits (0-9) (at least 11 times (matching the least amount possible)) ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

And I thought the \d{11,}? was the invented bit. I'll have to play with that sometime.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^4: why my reg ex matches greedy?
by Anonymous Monk on Jun 26, 2012 at 04:53 UTC

    And I thought the \d{11,}? was the invented bit. I'll have to play with that sometime.

    You should, the future of all new perl regex features rests upon that syntax

      I don't see the relation between your link and that bit of the regex. However, that said, your skepticism of its utility seems justified. I tried to find a use for it, but I haven't been able to make \d{4,}? act any differently than \d{4}. It's either a useless construct, or a failure of my imagination in coming up with an appropriate test case.

      Putting aside what YAPE::Regex::Explain says about it, when I looked at it originally, I thought "Yack! Perl is gonna bitch about that weird '?' character". I could think of a couple other interpretations, so I put together a bit of code to check 'em out:

      my @tests = ( 'First case', '123456789second & third case', ); for my $t (@tests) { print "\nchecking '$t'\n"; + if ($t=~/(\d{4,}?)(.*)/) { print "A: $1, $2\n"; } if ($t=~/(\d{4,}?)(.*?)$/) { print "B: $1, $2\n"; } }

      The other interpretations I could think of were:

      • An optional set of 4 or more digits, kind of like (?:\d{4,})?. If true, the first case would give us:

        A: , First case'
      • Exactly 4 digits, like \d{4}, giving us:

        checking '123456789second & third case' A: 1234, 56789second & third case B: 1234, 56789second & third case
      • 4 or more digits, with as few as possible, yielding:

        checking '123456789second & third case' A: 1234, 56789second & third case B: 123456789, second & third case

      On reading the ...explain() output, I thought that I could perhaps make the third case come about. But what I actually got was:

      $ perl xxxyyyzzz.pl checking 'First case' checking '123456789second & third case' A: 1234, 56789second & third case B: 1234, 56789second & third case

      So I'm thinking that my initial surprise was justified, even though it's syntactically correct.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        It's different if there is some following match condition:

        use strict; use warnings; for my $reg ('(x\d{3,}?)', '(x\d{3,}?x)', '(x\d{3,})', '(x\d{3})') { for my $str ('xx', 'x12x', 'x123456x', 'x12x x123x') { print "Matched using $reg: $1\n" if $str =~ $reg; } }

        Prints:

        Matched using (x\d{3,}?): x123 Matched using (x\d{3,}?): x123 Matched using (x\d{3,}?x): x123456x Matched using (x\d{3,}?x): x123x Matched using (x\d{3,}): x123456 Matched using (x\d{3,}): x123 Matched using (x\d{3}): x123 Matched using (x\d{3}): x123
        True laziness is hard work