I've come across something a bit strange when using substitution at a point determined by look-behind and -ahead assertions. I used a regex code block to print some diagnostic text whenever a match occurred but found to my surprise that the code block seemed to be called twice for each match. Here's a small script that demonstrates the behaviour. It is just inserting a plus sign into a string at any point between pseudo-tags with different sorts of bracket pairs.

use strict; use warnings; use re q{eval}; sub doSep { print q{-} x 40, qq{\n}; } my $count = 0; my $string = q{<x1>[x2]{x3}(x4)}; my $rxClose = qr@[]>})]@; my $rxOpen = qr@[[<{(]@; my $rxBetween = qr {(?x) (?<=($rxClose)) (?=($rxOpen)) (?{print qq{Match @{ [++ $count] }: left $1, right $2\n}}) }; print qq{ Before: $string\n}; doSep(); $string =~ s{$rxBetween}{+}g; doSep(); print qq{ After: $string\n};

Even though the code block seems to execute twice per match the substitution is only done once. Here's the output, six apparent matches, weird.

Before: <x1>[x2]{x3}(x4) ---------------------------------------- Match 1: left >, right [ Match 2: left >, right [ Match 3: left ], right { Match 4: left ], right { Match 5: left }, right ( Match 6: left }, right ( ---------------------------------------- After: <x1>+[x2]+{x3}+(x4)

If I change the code so that look-arounds are not used the strange behaviour does not happen and the code block is only executed once per match.

use strict; use warnings; use re q{eval}; sub doSep { print q{-} x 40, qq{\n}; } my $count = 0; my $string = q{<x1>[x2]{x3}(x4)}; my $rxClose = qr@[]>})]@; my $rxOpen = qr@[[<{(]@; my $rxBetween = qr {(?x) ($rxClose) ($rxOpen) (?{print qq{Match @{ [++ $count] }: left $1, right $2\n}}) }; print qq{ Before: $string\n}; doSep(); $string =~ s{$rxBetween}{$1+$2}g; doSep(); print qq{ After: $string\n};

The output this time, just three matches as expected.

Before: <x1>[x2]{x3}(x4) ---------------------------------------- Match 1: left >, right [ Match 2: left ], right { Match 3: left }, right ( ---------------------------------------- After: <x1>+[x2]+{x3}+(x4)

I have done other testing to try and pin down what is happening. The code block also executes twice if I am just matching using look-arounds rather than substituting. I have also tried using just a single look-behind rather than both lokk-behind and -ahead but still get the double execution.

Can any Monk throw some light on what is happening here?

Cheers,

JohnGG


In reply to Regex code block executes twice per match using look-arounds by johngg

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.