Basic Regular expression

skkeni04 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a doubt which might seem basic but here it is:

my $a = "This is Perl";
$a =~/^(.+)(e|r)(.*)$/;
[download]

I get $1= This is Pe; $2= r; $3= l;

According to me, it should be "$1= This is Perl". $2 and $3 should be null. So my doubt is how are the capture brackets evaluated?

Comment on Basic Regular expression Download Code

Replies are listed 'Best First'.
Re: Basic Regular expression by choroba (Cardinal) on Feb 09, 2017 at 16:30 UTC
$2 can't be null, because it must be either `e` or `r` to make the match successful. Compare with `$a =~ /^(.+)([er]?)(.)$/;` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7*2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: Basic Regular expression by skkeni04 (Initiate) on Feb 13, 2017 at 03:32 UTC
Thanks, Got it.	[reply]
Re: Basic Regular expression by Marshall (Canon) on Feb 09, 2017 at 16:39 UTC
from http://perldoc.perl.org/perlre.html: By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?" . Note that the meanings don't change, just the "greediness": $1 leaves space so that the other terms can match. Update, consider: `my $x = "This is Perl"; $x =~/^((.+)(e\|r)(.))$/; print "1={$1} 2={$2} 3={$3} 4={$4}\n"; # 1={This is Perl} 2={This is Pe} 3={r} 4={l} my $x = "This is Perl, nice Perl"; $x =~/^((.+)(e\|r)(.))$/; print "1={$1} 2={$2} 3={$3} 4={$4}\n"; # 1={This is Perl, nice Perl} 2={This is Perl, nice Pe} 3={r} 4={l}` [download] A small update, I changed $a to $x in the above code. In Perl, $a and $b are special variables used for among other things in sort functions. Normal user code should not use these variables except in their strange special cases. So something like $x and $y is better. In the above code using $a wouldn't matter, but I changed it anyway to point out that this is a bad habit that can lead to problems in longer programs. Just something to watch out for if you code in other languages that don't have special meanings for a or b.	[reply] [d/l]
Re^2: Basic Regular expression by skkeni04 (Initiate) on Feb 13, 2017 at 03:33 UTC
Thanks, got it!	[reply]
Re: Basic Regular expression by hippo (Bishop) on Feb 09, 2017 at 16:31 UTC
`$2` cannot be null, it must be either "e" or "r" or else the entire regex would fail to match. Hopefully the rest becomes obvious once you understand this part?	[reply] [d/l]
Re^2: Basic Regular expression by skkeni04 (Initiate) on Feb 13, 2017 at 03:32 UTC
Yes, it did. Thanks!	[reply]
Re: Basic Regular expression by AnomalousMonk (Archbishop) on Feb 09, 2017 at 18:58 UTC
If you're dealing only with regex operators supported by Perl version 5.6 and before (as you are in the OPed example), the YAPE::Regex::Explain module can sometimes be helpful: c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new('^(.+)(e\|r)(.)$')->explain; " The regular expression: (?-imsx:^(.+)(e\|r)(.)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .+ any character except \n (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- e 'e' ---------------------------------------------------------------------- \| OR ---------------------------------------------------------------------- r 'r' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: Basic Regular expression by Laurent_R (Canon) on Feb 09, 2017 at 18:10 UTC
You've been given good answers already, but let me just add some details on the way the regex engine processes this string. According to me, it should be "$1= This is Perl". In fact, that's what is initially happening: the regex engine sees `(.+)` and grabs the whole string, i.e. "this is Perl". But then, it sees `(e\|r)`, so, in order for the whole regex to be successful, it has to backtrack and give back "l" and then "r", so that `(e\|r)` can be successful. Note that this would happen even if you did not have capturing parentheses, so that the point is not so much that it is trying to populate $2, but that `(e\|r)` has to match something for the whole regex to be successful. Once it has matched the "r" with the second capture, the last part of the regex, `(.*)$` can match the "l".	[reply] [d/l] [select]
Re: Basic Regular expression by Corion (Patriarch) on Feb 09, 2017 at 16:30 UTC
In what situation can the second parenthesis be empty and still produce an overall match?	[reply]
Re: Basic Regular expression by NetWallah (Canon) on Feb 09, 2017 at 17:29 UTC
You can achieve your desired output by using this re: `$a =~/^(.+)(e\|r)?(.)$/;` [download] Update: Just noticed - this is almost the same as choroba's suggestion. ...it is unhealthy to remain near things that are in the process of blowing up. man page for WARP, by Larry Wall*	[reply] [d/l]
Re: Basic Regular expression by tweetiepooh (Hermit) on Feb 09, 2017 at 16:48 UTC
The answer you get is what is expected. What do you think the regex reads like? Start then capture 1 or more character upto "e" or "r" captured then capture anything left to end. Remember the match is greedy so matches the "r" in the option rather than the "e".	[reply]
Re^2: Basic Regular expression by ikegami (Patriarch) on Feb 09, 2017 at 18:00 UTC
Start then capture 1 or more character upto "e" or "r" If it was true, the OP would have received `$1="This is P"; $2="e"; $3="rl";` [download] instead of `$1="This is Pe"; $2="r"; $3="l";` [download] It actually matches until the end of the line as the OP expects, but its then forced to backtrack until it finds a position that's followed by `e` or `r`.	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom