comment on

Perhaps a stepwise commented example would make it clear what my issue is.

Matching REx "(?:(.)b|.)+" against "abc"
   0 <> <abc>                |  1:CURLYX[0] {1,32767}(15)
   0 <> <abc>                | 14:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 1..32767
   0 <> <abc>                |  3:    BRANCH(11)
   0 <> <abc>                |  4:      OPEN1(6)
   0 <> <abc>                |  6:      REG_ANY(7)
   1 <a> <bc>                |  7:      CLOSE1(9)
[download]

$1 should be 'a' at this point.

   1 <a> <bc>                |  9:      EXACTF <b>(14)
   2 <ab> <c>                | 14:      WHILEM[1/1](0)
                                        whilem: matched 1 out of 1..32
+767
   2 <ab> <c>                |  3:        BRANCH(11)
   2 <ab> <c>                |  4:          OPEN1(6)
   2 <ab> <c>                |  6:          REG_ANY(7)
   3 <abc> <>                |  7:          CLOSE1(9)
[download]

$1 might be 'c' at this point

   3 <abc> <>                |  9:          EXACTF <b>(14)
                                            failed...
   2 <ab> <c>                | 11:        BRANCH(13)
[download]

$1 should have been restored to 'a' at this point due to the backtracking.

   2 <ab> <c>                | 12:          REG_ANY(14)
   3 <abc> <>                | 14:          WHILEM[1/1](0)
                                            whilem: matched 2 out of 1
+..32767
   3 <abc> <>                |  3:            BRANCH(11)
   3 <abc> <>                |  4:              OPEN1(6)
   3 <abc> <>                |  6:              REG_ANY(7)
                                                failed...
   3 <abc> <>                | 11:            BRANCH(13)
   3 <abc> <>                | 12:              REG_ANY(14)
                                                failed...
                                              BRANCH failed...
                                            whilem: failed, trying con
+tinuation...
   3 <abc> <>                | 15:            NOTHING(16)
   3 <abc> <>                | 16:            END(0)
Match successful!
Test:             $1='c', $2=''
[download]

$1 should DEFINITELY not be 'c'!
Where did the 'a' go?

Compare with:

Matching REx "(?:(.)b|.)+" against "ac"
   0 <> <ac>                 |  1:CURLYX[0] {1,32767}(15)
   0 <> <ac>                 | 14:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 1..32767
   0 <> <ac>                 |  3:    BRANCH(11)
   0 <> <ac>                 |  4:      OPEN1(6)
   0 <> <ac>                 |  6:      REG_ANY(7)
   1 <a> <c>                 |  7:      CLOSE1(9)
[download]

$1 might be 'a' at this point

   1 <a> <c>                 |  9:      EXACTF <b>(14)
                                        failed...
   0 <> <ac>                 | 11:    BRANCH(13)
[download]

$1 should have been restored to undef at this point, due to the backtracking

   0 <> <ac>                 | 12:      REG_ANY(14)
   1 <a> <c>                 | 14:      WHILEM[1/1](0)
                                        whilem: matched 1 out of 1..32
+767
   1 <a> <c>                 |  3:        BRANCH(11)
   1 <a> <c>                 |  4:          OPEN1(6)
   1 <a> <c>                 |  6:          REG_ANY(7)
   2 <ac> <>                 |  7:          CLOSE1(9)
   2 <ac> <>                 |  9:          EXACTF <b>(14)
                                            failed...
   1 <a> <c>                 | 11:        BRANCH(13)
   1 <a> <c>                 | 12:          REG_ANY(14)
   2 <ac> <>                 | 14:          WHILEM[1/1](0)
                                            whilem: matched 2 out of 1
+..32767
   2 <ac> <>                 |  3:            BRANCH(11)
   2 <ac> <>                 |  4:              OPEN1(6)
   2 <ac> <>                 |  6:              REG_ANY(7)
                                                failed...
   2 <ac> <>                 | 11:            BRANCH(13)
   2 <ac> <>                 | 12:              REG_ANY(14)
                                                failed...
                                              BRANCH failed...
                                            whilem: failed, trying con
+tinuation.
..
   2 <ac> <>                 | 15:            NOTHING(16)
   2 <ac> <>                 | 16:            END(0)
Match successful!
Test:             $1='', $2=''
[download]

And this time, $1 was handled sensibly.

In reply to A tale of two regex by SuicideJunkie
in thread Leaking Regex Captures by SuicideJunkie

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.