Re: Oddness with regex quantifiers
by moritz (Cardinal) on Nov 23, 2010 at 21:29 UTC
|
If you want at most MAX repetitions, use {0,MAX}
The reason for the asymmetry between {MIN,} and {,MAX} is probably that there's a zero in perl, but Inf isn't generally supported.
If you wonder what {,MAX} matches, here's the answer:
$ perl -Mre=debug -ce ' /a{,5}/'
Compiling REx "a{,5}"
Final program:
1: EXACT <a{,5}> (4)
4: END (0)
anchored "a{,5}" at 0 (checking anchored isall) minlen 5
-e syntax OK
Freeing REx: "a{,5}"
$ perl -wE 'say "yes" if "a{,4}" =~ /a{,4}/'
yes
# in contrast:
$ perl -Mre=debug -ce ' /a{0,5}/'
Compiling REx "a{0,5}"
Final program:
1: CURLY {0,5} (5)
3: EXACT <a> (0)
5: END (0)
minlen 0
-e syntax OK
Freeing REx: "a{0,5}"
| [reply] [d/l] [select] |
|
|
The reason for the asymmetry between {MIN,} and {,MAX} is probably that there's a zero in perl, but Inf isn't generally supported.
The issue was raised not so long ago on p5p, it even caught old farts off guard. I don't think anyone recalled the reason why it does what it does. And people expressed the wish it would have been done otherwise in the past. Some have suggested a warning, but IIRC, nothing happened. It's unlikely the actual meaning is going to change. The advantages of not having to type a 0 don't out-weight the negative impact of potentially breaking code. It's one of the many things that with the benefit of hindsight would have been done differently.
| [reply] |
|
|
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
Hmm .. but isn't the zero implied by the null field?
Not really, I don't think.
If perl accepted {,X}, I'm pretty sure some people would assume that it stood for {1,X}, not {0,X}. After all, one to X matches makes more sense than zero to X matches in many situations.
It's propably best to force people to make an explicit choice.
| [reply] |
|
|
|
|
|
|
|
|
$ perl -E'say "aa" =~ /a{1,}/'
1
$ perl -E'say "aa" =~ /a{1,0}/'
Can't do {n,m} with n > m in regex; marked by <-- HERE in m/a{1,0} <--
+ HERE / at -e line 1.
| [reply] [d/l] |
Re: Oddness with regex quantifiers
by fisher (Priest) on Nov 23, 2010 at 21:28 UTC
|
Just use {0,5} quantifier =) | [reply] |
Re: Oddness with regex quantifiers
by BrowserUk (Patriarch) on Nov 23, 2010 at 22:06 UTC
|
I would think that a reason, is that there is no reasonable default.
Just as we have * for zero or more and + for 1 or more; either 0 or 1 would be equally valid defaults for {,n}. Since there is no hamming difference between them, how would you pick one over the other to be the default?
| [reply] [d/l] [select] |
|
|
Good point. I even suggested 0 earlier in this thread, but now I think 1 would be a better choice. I guess I'm spoiled by how well Perl deals with default values; I though it would DWIM in this case.
And I was wrong. I'm now chastened, older and wider wiser; and I'm using {1,5} in my regex.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] [d/l] |
|
|
Thank you for giving us the benefits of your experience.
| [reply] |
Re: Oddness with regex quantifiers
by ikegami (Patriarch) on Nov 23, 2010 at 21:30 UTC
|
It's not missing: {0,MAX}
(It took forever to post!)
| [reply] [d/l] |
Re: Oddness with regex quantifiers
by cdarke (Prior) on Nov 25, 2010 at 13:34 UTC
|
This is not specific to Perl. In the POSIX standard for Extended Regular Expressions (The Open Group Base Specifications Issue 7):
...an interval expression of the format "{m}" , "{m,}" , or "{m,n}"
And yes, Perl supports many other extensions that are not in POSIX.
| [reply] |
|
|
Too bad .. I just thought for the sake of orthogonality that {,m] should have been present, with an implied 1 for the missing value. But it sounds like there are equally good arguments for having the implied value be 0 .. so perhaps the decision was made not to implement that grammar at all.
Lesson learned .. thanks for the udpate.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] [d/l] |