in reply to regex finding shortest string containing n of $c
The question could be interpreted as follows:
The second alternative is more difficult than it appears. Firstly, there is a special case where n is 1 or 2. Secondly, the first x to be matched can appear within a match that has already taken place.
In order to demonstrate this problem I have added an extra x to the beginning of the data and I am looking for 4 x's. The shortest string containing four x's is xxx.x but the first match that the regex finds lands in the middle of this and so the next match misses the shortest string.
my $foo = "x.....xxx.x.....x......xxx...xx...x...xxx"; my @array = $foo =~ /( # capture in $1 [x] # match an opening x (?:[^x]*?[x][^x]*?) # match an x surrounded by some non-x {2} # match n-2 times [x] # match a closing x )/gx; #(?{push @array, $1}) # keep the match #$ # match the end of the string to forc +e back-tracking print "$_\n" foreach (sort {length $a <=> length $b} @array); __END__ x.....xxx x...xx...x x.....x......xx
In order to resolve this situation, you can use backtracking in the regex. The following regex embeds code to store the matches. In this way, all of the possible matches are found (albeit a number of times).
my $foo = "x.....xxx.x.....x......xxx...xx...x...xxx"; my @array; $foo =~ /( # capture in $1 [x] # match an opening x (?:[^x]*?[x][^x]*?) # match an x surrounded by some non-x {2} # match n-2 times [x] # match a closing x ) (?{push @array, $1}) # keep the match $ # match the end of the string to force + back-tracking /x; print "$_\n" foreach (sort {length $a <=> length $b} @array); __END__ xxx.x xxx...x xx...xx xx...xx xx...xx xx...xx x...xxx x.....xxx xx.x.....x xx.x.....x x......xxx x...xx...x xx...x...x xx...x...x xx...x...x xx...x...x x...x...xx x...x...xx x...x...xx x...x...xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: regex finding shortest string containing n of $c
by xipho (Scribe) on Sep 01, 2005 at 14:43 UTC |