As mentioned earlier, the question is ambiguous and so is subject to interpretation. (It also smells of homework but its interesting so I will let it pass.)

The question could be interpreted as follows:

  1. Anchor the pattern at the beginning of the string in which case kvale has provided the answer earlier.
  2. Start the matching process at the beginning of the string and find any part of the string that is bounded by x that such that it contains the desired number (n) of x's.

The second alternative is more difficult than it appears. Firstly, there is a special case where n is 1 or 2. Secondly, the first x to be matched can appear within a match that has already taken place.

In order to demonstrate this problem I have added an extra x to the beginning of the data and I am looking for 4 x's. The shortest string containing four x's is xxx.x but the first match that the regex finds lands in the middle of this and so the next match misses the shortest string.

my $foo = "x.....xxx.x.....x......xxx...xx...x...xxx"; my @array = $foo =~ /( # capture in $1 [x] # match an opening x (?:[^x]*?[x][^x]*?) # match an x surrounded by some non-x {2} # match n-2 times [x] # match a closing x )/gx; #(?{push @array, $1}) # keep the match #$ # match the end of the string to forc +e back-tracking print "$_\n" foreach (sort {length $a <=> length $b} @array); __END__ x.....xxx x...xx...x x.....x......xx

In order to resolve this situation, you can use backtracking in the regex. The following regex embeds code to store the matches. In this way, all of the possible matches are found (albeit a number of times).

my $foo = "x.....xxx.x.....x......xxx...xx...x...xxx"; my @array; $foo =~ /( # capture in $1 [x] # match an opening x (?:[^x]*?[x][^x]*?) # match an x surrounded by some non-x {2} # match n-2 times [x] # match a closing x ) (?{push @array, $1}) # keep the match $ # match the end of the string to force + back-tracking /x; print "$_\n" foreach (sort {length $a <=> length $b} @array); __END__ xxx.x xxx...x xx...xx xx...xx xx...xx xx...xx x...xxx x.....xxx xx.x.....x xx.x.....x x......xxx x...xx...x xx...x...x xx...x...x xx...x...x xx...x...x x...x...xx x...x...xx x...x...xx x...x...xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.....x......xx x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x x.x.....x......x

In reply to Re: regex finding shortest string containing n of $c by inman
in thread regex finding shortest string containing n of $c by xipho

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.