Re^3: problem with optional capture group

Win8 Strawberry 5.8.9.5 (32)  Tue 12/22/2020 16:43:09
C:\@Work\Perl\monks
>perl -Mstrict -Mwarnings
for my $line (
    '<div id="foo-bar-321" class="bin-boff"></div>',
    '<div id="foo-bar-321" class="bin-boff"> </div>',
    '<div id="foo-bar-321" class="bin-boff">foo</div>',
    '<div id="foo-bar-321" class="bin-boff"> foo </div>',
    '<div id="foo-bar-321" class="bin-boff">',
    '<div id="foo-bar-321" class="bin-boff">  ',
    '<div id="foo-bar-321" class="bin-boff">foo',
    '<div id="foo-bar-321" class="bin-boff"> foo',
    ) {

    if ($line =~ m{ <div (?: (?! </div) .)+ (</div)? }xms) {
        print "line matched \n  '$&' \n";
        if (defined $1) {
            print "  right after match, \$1 is defined '$1' \n";
            }
        }

    }
^Z
line matched
  '<div id="foo-bar-321" class="bin-boff"></div'
  right after match, $1 is defined '</div'
line matched
  '<div id="foo-bar-321" class="bin-boff"> </div'
  right after match, $1 is defined '</div'
line matched
  '<div id="foo-bar-321" class="bin-boff">foo</div'
  right after match, $1 is defined '</div'
line matched
  '<div id="foo-bar-321" class="bin-boff"> foo </div'
  right after match, $1 is defined '</div'
line matched
  '<div id="foo-bar-321" class="bin-boff">'
line matched
  '<div id="foo-bar-321" class="bin-boff">  '
line matched
  '<div id="foo-bar-321" class="bin-boff">foo'
line matched
  '<div id="foo-bar-321" class="bin-boff"> foo'
[download]

Give a man a fish: <%-{-{-{-<

Comment on Re^3: problem with optional capture group Select or Download Code

Replies are listed 'Best First'.
Re^4: problem with optional capture group by Special_K (Pilgrim) on Dec 23, 2020 at 16:31 UTC
`m{ <div (?: (?! </div) .)+ (</div)? }xms)` [download] Can you please give a brief explanation regarding how the above regex works? It seems to use a few constructs I've never seen before and searching Google for regex symbols doesn't work very well. In particular, is enclosing a regex in 'm()', as you have done above, equivalent to enclosing it in '//'? What is the trailing xms doing?	[reply] [d/l]
Re^5: problem with optional capture group by AnomalousMonk (Archbishop) on Dec 23, 2020 at 20:54 UTC
... enclosing a regex in 'm()' ... The `m` `open-delimiter` `pattern` `close-delimiter` form is what I think of as the "canonical" form of the `m//` operator, where the delimiters can be a wide variety of characters including `{} () <> []` matching braces. The `//` match form is a special case. Likewise the `qr// s///` operators. This alleviates a lot of escape-ology connected with the `/` character in regexes. See perlop. (Note that `q// qq// qx// qw// tr/// y///` and maybe some others also use this delimiter convention.) What is the trailing xms doing? I use the `/ms` modifiers as part of a standard "tail" on all my `qr// m// s///` expressions to give the `. ^ $` operators a ~~standard~~ \| fixed behavior. This eliminates some degrees of freedom in regex behavior and makes them slightly easier to understand. The `/x` modifier in the standard tail enables the use of whitespace to help clarify a regex. See Modifiers in perlre. `(?: (?! </div) .)+` This has already been covered by GrandFather here. This expression just steps forward grabbing one character after another as long as that character is not a part of whatever matches the `(?!...)` negative lookahead expression, a closing `div` tag fragment in this case. A bit slow perhaps, but effective and flexible (update: flexible in that the lookahead expression can be of any complexity). See Lookaround Assertions in perlre; see also perlretut, perlreref and perlrequick. `(</div)?` Optionally capture a literal character sequence if it is present. The capture variable `$1` (in this case) will hold the captured sequence if it was present, otherwise `$1` will be undefined. See perlre, etc., as above. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]