comment on

Hello Monks,

I have been noodling how to use \K, and variable length positive look behind does make sense (see below, somewhat contrived) but the notion of forgetting what matched before \K has me puzzled. I'm hoping a Monk/Monger can explain what issue/use case motivated the dev team to add \K to the regular expression suite. My example, which helped me understand the look behind issue was lifted from stack overflow and modified, to wit:

use v5.30.0;
use strict;
#use re 'debug';
#use Regexp::Debugger;

my $str = '\'foooooooooobar and fubar\'';
say $str;

#
# Since + is 1 to unlimited, the unlimited exceeds 255, so unless you 
+limit the max
# to something less than 255, the look behind will not compile.
#
my $re = qr/(?<=fo{1,254}?) \K obar/x;
say $re;
say "After exiting the look behind, Perl forgets whatever matched befo
+re \\K";
$str =~ $re;
say $& if $&;

exit(0);
__END__
[download]

Yields:

    'foooooooooobar and fubar'
    (?^ux:(?<=fo{1,254}?) \K obar)
    After exiting the look behind, Perl forgets whatever matched before \K
    obar

So, my inference, whenever I encounter a long string of the same stuff, say \w, and I want only something on the far right, I can use variable look behind and \K to extract what I want and ignore the rest. This makes sense though I had to think real hard to both express a question (to SO) and concoct an example. If the string is mixed stuff, not just \w, then other regex recipes would seem to make \K unnecessary. I don't believe the developers had this use case at the top of their list of reasons to implement \K, so I'd like to hear from the best of the best Perl heads how to make good use of \K.

Thanks in advance for what I hope will be another stimulating discussion of the wonders of Perl, the eighth great wonder of the known world.

U P D A T E, 10/6/2023

Thanks to all for the illuminating examples. Prior to 5.20 I followed best Perl practice and avoided $`, $& and $', which meant I used group capture to pick sub-expressions from strings in production Perl. I missed upgrading from 5.18 (I retired in 2014) to 5.20, so I was not aware that 5.20 fixed the performance issues with the use of those three variables, other than for use during debugging. I returned to teaching Applied Math last year, so I did upgrade 5.18 to first 5.34 and now 5.36, but still gravitated to group capture. Undiscovered until recently are the numerous ways to express regex without paying the grouping overhead. \K appeared on the scene, I believe, at 5.10, so using the three variables still incurred a heavy performance hit. My take on all this is that \K finally came into its own at 5.20. Again, my take is that group capture might only be optimal if two or more sub-expressions must be picked from a string. To convince myself that there are real performance gains from using \K in ways suggested in this discussion, I wrote 4 regex expressions to perform the same substitution, and the differences are very real. Here are some interesting results, to wit:

  Running cmpthese(-5, {...});
  $_ = 'Buster and Mimi'
      Objective: substitute 'Ginger' for 'Mimi'
  lookBehind:       qr/(?<=\A\S{1,250} and )\S+\z/
  literalSlashK:    qr/\ABuster and \K\S+\z/
  metaSlashK:       qr/\b\K\S+\z/
  groupCapture:     qr/(\S+)\z/
                       Rate lookBehind groupCapture metaSlashK literalSlashK
  lookBehind       185048/s         --         -68%       -69%          -76%
  groupCapture     580337/s       214%           --        -4%          -25%
  metaSlashK       606128/s       228%           4%         --          -21%
  literalSlashK    769629/s       316%          33%        27%            --

Thanks to all who contributed to this discussion.

In reply to Practical use cases for \K in regex by perlboy_emeritus

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.