regex greedy range

johnnywang has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to extract the beginning part of a path like string. The following script:

use strict;
my @input = ("A/B/C/D/E/F/", "A/B/C/D/","A/B/C/D","A/B/C/", "A/B/C","A
+/B","A");
foreach (@input){
  print "$1\n" if m|((?:\w+/){4})|;
}
[download]

only matches the first two elements, and gives:

A/B/C/D/
A/B/C/D/
[download]

I'd like to have a regex that matches all elements in the input, i.e., match up to level 4, and also make the "/" at the end optional. Desired output:

A/B/C/D/
A/B/C/D/
A/B/C/D
A/B/C/
A/B/C
A/B
A
[download]

What's the best way? Thanks.

Comment on regex greedy range Select or Download Code

Replies are listed 'Best First'.
Re: regex greedy range by ikegami (Patriarch) on Sep 16, 2004 at 23:34 UTC
Going from `print "$1\n" if m\|((?:\w+/){4})\|;` to `print "$1\n" if m\|((?:\w+/){0,4})\|;` gets you halfway there. Add a '?': `print "$1\n" if m\|((?:\w+/?){0,4})\|;` and there you are. '{4}' means match exactly 4 times, whereas '{0,4}' means match up to 4 times. The '?' makes the '/' optional.	[reply] [d/l] [select]
Re^2: regex greedy range by Aristotle (Chancellor) on Sep 16, 2004 at 23:55 UTC
Except now all parts of your regex are optional so it will match even the empty string. Also, quantified parens containing only quantified terms are a recipe for eventual disaster. I'd rather write this like so: `m\|(\w+(?:/\w+){1,3})\|` [download] Makeshifts last the longest.	[reply] [d/l]
Re: regex greedy range by johnnywang (Priest) on Sep 16, 2004 at 23:50 UTC
Thanks. The reason I titled it as "greedy range" is that I thought {2,4} will stop as soon as it matched 2 instances, I guess everything is greedy unless explicitly stated otherwise. Well, I should have just tried it, PM is making me lazier. Thanks.	[reply]
Re^2: regex greedy range by ikegami (Patriarch) on Sep 16, 2004 at 23:55 UTC
Correct, everything is greedy unless you add the '?'. greedy: `a?, a, a+, a{m,n}` !greedy: `a??, a?, a+?, a{m,n}?`	[reply] [d/l] [select]
Re: regex greedy range by Aristotle (Chancellor) on Sep 16, 2004 at 23:19 UTC
Well, if you want to capture all the parts, then, well capture all the parts. `my @input = qw( A/B/C/D/E/F/ A/B/C/D/ A/B/C/D A/B/C/ A/B/C A/B A ); foreach (@input){ next if not m!((((\w+/)\w+/)\w+/)\w+/?)!; print "$_ has $4 $3 $2 $1\n"; }` [download] Misread the question… Makeshifts last the longest.	[reply] [d/l]
Re^2: regex greedy range by ikegami (Patriarch) on Sep 16, 2004 at 23:40 UTC
You forgot a bunch of question marks, and you're using capturing when you only need grouping: `m!((((\w+/)\w+/)\w+/)\w+/?)!;` should be: `m!((?:(?:(?:\w+/)?\w+/)?\w+/)?\w+/?)!;` but that requires lots of backtracking, so I think it's less efficient than: `m!(\w+(?:/\w+(?:/\w+(?:/\w+)?)?)?)!;` which only requires a single character lookahead.	[reply] [d/l] [select]
Re^3: regex greedy range by Aristotle (Chancellor) on Sep 16, 2004 at 23:47 UTC
No, I didn't forgot the question marks, and I used capturing parens on purpose. But I was answering a different question than was actually asked. It could actually turn out more efficient with a slight variation: `m!((?>(?>(?>\w+/)?\w+/)?\w+/)?\w+/?)!;` [download] I haven't done any benchmarks though. Makeshifts last the longest.	[reply] [d/l]