comment on

This is a great case for the \K assertion (update: forgot to mention that \K is new for 5.10 but available to "everyone" via Regexp::Keep by Jeff Pinyan who come up with the idea (I don't know if that will provide you the same efficiency though)). Not only is it easier, but it's also more efficient due to the optimizations of the regexp engine. The pattern would look like this:

s/\.\K[^.]*$/txt/;
[download]

The great part with this is that the engine can start looking for a literal (the dot) and avoid a lot of backtracking. The output of use re 'debug'; will visualize this.

With the look-behind pattern, you see there's a lot of backtracking going on, and the engine guesses a match at the beginning of the string (the string is "xyz.foo" in the examples below).

Compiling REx "(?<=[.])[^.]*$"
Final program:
   1: IFMATCH[-1] (7)
   3:   EXACT <.> (5)
   5:   SUCCEED (0)
   6: TAIL (7)
   7: STAR (19)
   8:   ANYOF[\0-\-/-\377{unicode_all}] (0)
  19: EOL (20)
  20: END (0)
floating ""$ at 0..2147483647 (checking floating) minlen 0
Guessing start of match in sv for REx "(?<=[.])[^.]*$" against "xyz.fo
+o"
Found floating substr ""$ at offset 7...
Guessed: match at offset 0
Matching REx "(?<=[.])[^.]*$" against "xyz.foo"
   0 <> <xyz.foo>            |  1:IFMATCH[-1](7)
                                  failed...
   1 <x> <yz.foo>            |  1:IFMATCH[-1](7)
   0 <> <xyz.foo>            |  3:  EXACT <.>(5)
                                    failed...
                                  failed...
   2 <xy> <z.foo>            |  1:IFMATCH[-1](7)
   1 <x> <yz.foo>            |  3:  EXACT <.>(5)
                                    failed...
                                  failed...
   3 <xyz> <.foo>            |  1:IFMATCH[-1](7)
   2 <xy> <z.foo>            |  3:  EXACT <.>(5)
                                    failed...
                                  failed...
   4 <xyz.> <foo>            |  1:IFMATCH[-1](7)
   3 <xyz> <.foo>            |  3:  EXACT <.>(5)
   4 <xyz.> <foo>            |  5:  SUCCEED(0)
                                    subpattern success...
   4 <xyz.> <foo>            |  7:STAR(19)
                                  ANYOF[\0-\-/-\377{unicode_all}] can 
+match 3 times out of 2147483647...
   7 <xyz.foo> <>            | 19:  EOL(20)
   7 <xyz.foo> <>            | 20:  END(0)
Match successful!
[download]

However, if we look at the \K pattern, get get this:

Compiling REx "\.\K[^.]*$"
Final program:
   1: EXACT <.> (3)
   3: KEEPS (4)
   4: STAR (16)
   5:   ANYOF[\0-\-/-\377{unicode_all}] (0)
  16: EOL (17)
  17: END (0)
anchored "." at 0 floating ""$ at 1..2147483647 (checking anchored) mi
+nlen 1
Guessing start of match in sv for REx "\.\K[^.]*$" against "xyz.foo"
Found anchored substr "." at offset 3...
Found floating substr ""$ at offset 7...
Starting position does not contradict /^/m...
Guessed: match at offset 3
Matching REx "\.\K[^.]*$" against ".foo"
   3 <xyz> <.foo>            |  1:EXACT <.>(3)
   4 <xyz.> <foo>            |  3:KEEPS(4)
   4 <xyz.> <foo>            |  4:  STAR(16)
                                    ANYOF[\0-\-/-\377{unicode_all}] ca
+n match 3 times out of 2147483647...
   7 <xyz.foo> <>            | 16:    EOL(17)
   7 <xyz.foo> <>            | 17:    END(0)
Match successful!
[download]

That's nice. No backtracking.

lodin

In reply to Re: positive look behind regexp mystery (\K assertion) by lodin
in thread positive look behind regexp mystery by rovf

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.