Practical use cases for \K in regex

perlboy_emeritus has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have been noodling how to use \K, and variable length positive look behind does make sense (see below, somewhat contrived) but the notion of forgetting what matched before \K has me puzzled. I'm hoping a Monk/Monger can explain what issue/use case motivated the dev team to add \K to the regular expression suite. My example, which helped me understand the look behind issue was lifted from stack overflow and modified, to wit:

use v5.30.0;
use strict;
#use re 'debug';
#use Regexp::Debugger;

my $str = '\'foooooooooobar and fubar\'';
say $str;

#
# Since + is 1 to unlimited, the unlimited exceeds 255, so unless you 
+limit the max
# to something less than 255, the look behind will not compile.
#
my $re = qr/(?<=fo{1,254}?) \K obar/x;
say $re;
say "After exiting the look behind, Perl forgets whatever matched befo
+re \\K";
$str =~ $re;
say $& if $&;

exit(0);
__END__
[download]

Yields:

    'foooooooooobar and fubar'
    (?^ux:(?<=fo{1,254}?) \K obar)
    After exiting the look behind, Perl forgets whatever matched before \K
    obar

So, my inference, whenever I encounter a long string of the same stuff, say \w, and I want only something on the far right, I can use variable look behind and \K to extract what I want and ignore the rest. This makes sense though I had to think real hard to both express a question (to SO) and concoct an example. If the string is mixed stuff, not just \w, then other regex recipes would seem to make \K unnecessary. I don't believe the developers had this use case at the top of their list of reasons to implement \K, so I'd like to hear from the best of the best Perl heads how to make good use of \K.

Thanks in advance for what I hope will be another stimulating discussion of the wonders of Perl, the eighth great wonder of the known world.

U P D A T E, 10/6/2023

Thanks to all for the illuminating examples. Prior to 5.20 I followed best Perl practice and avoided $`, $& and $', which meant I used group capture to pick sub-expressions from strings in production Perl. I missed upgrading from 5.18 (I retired in 2014) to 5.20, so I was not aware that 5.20 fixed the performance issues with the use of those three variables, other than for use during debugging. I returned to teaching Applied Math last year, so I did upgrade 5.18 to first 5.34 and now 5.36, but still gravitated to group capture. Undiscovered until recently are the numerous ways to express regex without paying the grouping overhead. \K appeared on the scene, I believe, at 5.10, so using the three variables still incurred a heavy performance hit. My take on all this is that \K finally came into its own at 5.20. Again, my take is that group capture might only be optimal if two or more sub-expressions must be picked from a string. To convince myself that there are real performance gains from using \K in ways suggested in this discussion, I wrote 4 regex expressions to perform the same substitution, and the differences are very real. Here are some interesting results, to wit:

  Running cmpthese(-5, {...});
  $_ = 'Buster and Mimi'
      Objective: substitute 'Ginger' for 'Mimi'
  lookBehind:       qr/(?<=\A\S{1,250} and )\S+\z/
  literalSlashK:    qr/\ABuster and \K\S+\z/
  metaSlashK:       qr/\b\K\S+\z/
  groupCapture:     qr/(\S+)\z/
                       Rate lookBehind groupCapture metaSlashK literalSlashK
  lookBehind       185048/s         --         -68%       -69%          -76%
  groupCapture     580337/s       214%           --        -4%          -25%
  metaSlashK       606128/s       228%           4%         --          -21%
  literalSlashK    769629/s       316%          33%        27%            --

Thanks to all who contributed to this discussion.

Comment on Practical use cases for \K in regex Download Code

Replies are listed 'Best First'.
Re: Practical use cases for \K in regex by kcott (Archbishop) on Oct 05, 2023 at 16:27 UTC
G'day perlboy_emeritus, Use case: You have error and warning exceptions containing variable-length codes. You want to embed a timestamp before sending to a logfile. `$ perl -E ' my @exceptions = ( "ERROR (EABC): Error message 1", "WARNING (WPQRST): Error message 2", "ERROR (EWXYZ): Error message 3", ); my $re = qr{^(?:ERR\|WAR)[^:]+: \K.+$}; for my $e (@exceptions) { my $timestamp = localtime; $e =~ s/$re/[$timestamp] $&/; say $e; sleep 2; } ' ERROR (EABC): [Fri Oct 6 02:56:17 2023] Error message 1 WARNING (WPQRST): [Fri Oct 6 02:56:19 2023] Error message 2 ERROR (EWXYZ): [Fri Oct 6 02:56:21 2023] Error message 3` [download] I'd suggest doing a Super Search for `\K` to get a wide range of examples. Hints when searching: Ignore posts by me (kcott) as many with MSWin examples contain paths with "`...\Ken\...`". Monks whom I've noticed over the years using `\K` often, in many scenarios, are tybalt89 and AnomalousMonk. I'm not suggesting that you limit your search but those three usernames featured many times in my search results for just `\K`. — Ken	[reply] [d/l] [select]
Re: Practical use cases for \K in regex by hippo (Archbishop) on Oct 05, 2023 at 15:33 UTC
Good question. I think it is mostly historical these days. If you peruse the docs for 5.10 (when \K was introduced) they say this: There is a special form of [positive look-behind], called \K, which causes the regex engine to "keep" everything it had matched prior to the \K and not include it in $&. This effectively provides variable length look-behind. The use of \K inside of another look-around assertion is allowed, but the behaviour is currently not well defined. For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string. For instance `s/(foo)bar/$1/g;` can be rewritten as the much more efficient `s/foo\Kbar//g;` So back then you could get variable-length look-behind using `\K`. Now it's only needed in the situation you describe where there might be a longer-than-the-limit variable look-behind, AIUI. 🦛	[reply] [d/l] [select]
Re: Practical use cases for \K in regex by jo37 (Curate) on Oct 05, 2023 at 17:08 UTC
Variable look-behind-assertions were not available before 5.30. IMHO it makes no sense to combine a look-behind-assertion with \K. \K comes handy in cases like `echo fooooobar \| perl -pe 's/fo{2,}ba\Kr/z/'` [download] Greetings, -jo `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply] [d/l]
Re: Practical use cases for \K in regex by ikegami (Patriarch) on Oct 05, 2023 at 20:07 UTC
It simplifies. `say "foo: bar" =~ s/^ ( \w+: \s* ) bar /${1}baz/xr;` [download] vs `say "foo: bar" =~ s/^ \w+: \s* \K bar /baz/r;` [download] It helps avoid avoid repetition. `say "a:b:c:d" =~ s/ :\K [^:]+ (?=:) / uc($&) /xegr;` [download] vs `say "a:b:c:d" =~ s/ : [^:]+ (?=:) / ":" . uc($&) /xegr;` [download] It effectively permits variable-length lookbehinds or makes them faster. `say "foo: bar" =~ s/ (?<= ^ \w+: \s* ) bar /baz/xr;` [download] vs `say "foo: bar" =~ s/^ \w+: \s* \K bar /baz/r;` [download] The former was an error when `\K` was introduced. Even now, it's experimental, and there's a maximum to how much it will look behind this way.	[reply] [d/l] [select]