Re: Basic Regular expression
by choroba (Cardinal) on Feb 09, 2017 at 16:30 UTC
|
$2 can't be null, because it must be either e or r to make the match successful.
Compare with
$a =~ /^(.+)([er]?)(.*)$/;
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
| [reply] |
Re: Basic Regular expression
by Marshall (Canon) on Feb 09, 2017 at 16:39 UTC
|
from http://perldoc.perl.org/perlre.html:
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?" . Note that the meanings don't change, just the "greediness":
$1 leaves space so that the other terms can match.
Update, consider:
my $x = "This is Perl";
$x =~/^((.+)(e|r)(.*))$/;
print "1={$1} 2={$2} 3={$3} 4={$4}\n";
# 1={This is Perl} 2={This is Pe} 3={r} 4={l}
my $x = "This is Perl, nice Perl";
$x =~/^((.+)(e|r)(.*))$/;
print "1={$1} 2={$2} 3={$3} 4={$4}\n";
# 1={This is Perl, nice Perl} 2={This is Perl, nice Pe} 3={r} 4={l}
A small update, I changed $a to $x in the above code. In Perl, $a and $b are special variables used for among other things in sort functions. Normal user code should not use these variables except in their strange special cases. So something like $x and $y is better. In the above code using $a wouldn't matter, but I changed it anyway to point out that this is a bad habit that can lead to problems in longer programs. Just something to watch out for if you code in other languages that don't have special meanings for a or b. | [reply] [d/l] |
|
| [reply] |
Re: Basic Regular expression
by hippo (Bishop) on Feb 09, 2017 at 16:31 UTC
|
$2 cannot be null, it must be either "e" or "r" or else the entire regex would fail to match. Hopefully the rest becomes obvious once you understand this part?
| [reply] [d/l] |
|
| [reply] |
Re: Basic Regular expression
by AnomalousMonk (Archbishop) on Feb 09, 2017 at 18:58 UTC
|
If you're dealing only with regex operators supported by Perl version 5.6 and before (as you are in the OPed example), the YAPE::Regex::Explain module can sometimes be helpful:
c:\@Work\Perl\monks>perl -wMstrict -le
"use YAPE::Regex::Explain;
;;
print YAPE::Regex::Explain->new('^(.+)(e|r)(.*)$')->explain;
"
The regular expression:
(?-imsx:^(.+)(e|r)(.*)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
e 'e'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
r 'r'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Basic Regular expression
by Laurent_R (Canon) on Feb 09, 2017 at 18:10 UTC
|
You've been given good answers already, but let me just add some details on the way the regex engine processes this string.
According to me, it should be "$1= This is Perl".
In fact, that's what is initially happening: the regex engine sees (.+) and grabs the whole string, i.e. "this is Perl". But then, it sees (e|r), so, in order for the whole regex to be successful, it has to backtrack and give back "l" and then "r", so that (e|r) can be successful. Note that this would happen even if you did not have capturing parentheses, so that the point is not so much that it is trying to populate $2, but that (e|r) has to match something for the whole regex to be successful.
Once it has matched the "r" with the second capture, the last part of the regex, (.*)$ can match the "l".
| [reply] [d/l] [select] |
Re: Basic Regular expression
by Corion (Patriarch) on Feb 09, 2017 at 16:30 UTC
|
| [reply] |
Re: Basic Regular expression
by NetWallah (Canon) on Feb 09, 2017 at 17:29 UTC
|
You can achieve your desired output by using this re:
$a =~/^(.+)(e|r)?(.*)$/;
Update: Just noticed - this is almost the same as choroba's suggestion.
...it is unhealthy to remain near things that are in the process of blowing up. man page for WARP, by Larry Wall
| [reply] [d/l] |
Re: Basic Regular expression
by tweetiepooh (Hermit) on Feb 09, 2017 at 16:48 UTC
|
The answer you get is what is expected. What do you think the regex reads like?
Start then capture 1 or more character upto "e" or "r" captured then capture anything left to end.
Remember the match is greedy so matches the "r" in the option rather than the "e".
| [reply] |
|
$1="This is P"; $2="e"; $3="rl";
instead of
$1="This is Pe"; $2="r"; $3="l";
It actually matches until the end of the line as the OP expects, but its then forced to backtrack until it finds a position that's followed by e or r. | [reply] [d/l] [select] |