perlboy_emeritus has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I have been noodling how to use \K, and variable length positive look behind does make sense (see below, somewhat contrived) but the notion of forgetting what matched before \K has me puzzled. I'm hoping a Monk/Monger can explain what issue/use case motivated the dev team to add \K to the regular expression suite. My example, which helped me understand the look behind issue was lifted from stack overflow and modified, to wit:
use v5.30.0; use strict; #use re 'debug'; #use Regexp::Debugger; my $str = '\'foooooooooobar and fubar\''; say $str; # # Since + is 1 to unlimited, the unlimited exceeds 255, so unless you +limit the max # to something less than 255, the look behind will not compile. # my $re = qr/(?<=fo{1,254}?) \K obar/x; say $re; say "After exiting the look behind, Perl forgets whatever matched befo +re \\K"; $str =~ $re; say $& if $&; exit(0); __END__
Yields:
'foooooooooobar and fubar'
(?^ux:(?<=fo{1,254}?) \K obar)
After exiting the look behind, Perl forgets whatever matched before \K
obar
So, my inference, whenever I encounter a long string of the same stuff, say \w, and I want only something on the far right, I can use variable look behind and \K to extract what I want and ignore the rest. This makes sense though I had to think real hard to both express a question (to SO) and concoct an example. If the string is mixed stuff, not just \w, then other regex recipes would seem to make \K unnecessary. I don't believe the developers had this use case at the top of their list of reasons to implement \K, so I'd like to hear from the best of the best Perl heads how to make good use of \K.
Thanks in advance for what I hope will be another stimulating discussion of the wonders of Perl, the eighth great wonder of the known world.
U P D A T E, 10/6/2023
Thanks to all for the illuminating examples. Prior to 5.20 I followed best Perl practice and avoided $`, $& and $', which meant I used group capture to pick sub-expressions from strings in production Perl. I missed upgrading from 5.18 (I retired in 2014) to 5.20, so I was not aware that 5.20 fixed the performance issues with the use of those three variables, other than for use during debugging. I returned to teaching Applied Math last year, so I did upgrade 5.18 to first 5.34 and now 5.36, but still gravitated to group capture. Undiscovered until recently are the numerous ways to express regex without paying the grouping overhead. \K appeared on the scene, I believe, at 5.10, so using the three variables still incurred a heavy performance hit. My take on all this is that \K finally came into its own at 5.20. Again, my take is that group capture might only be optimal if two or more sub-expressions must be picked from a string. To convince myself that there are real performance gains from using \K in ways suggested in this discussion, I wrote 4 regex expressions to perform the same substitution, and the differences are very real. Here are some interesting results, to wit:
Running cmpthese(-5, {...});
$_ = 'Buster and Mimi'
Objective: substitute 'Ginger' for 'Mimi'
lookBehind: qr/(?<=\A\S{1,250} and )\S+\z/
literalSlashK: qr/\ABuster and \K\S+\z/
metaSlashK: qr/\b\K\S+\z/
groupCapture: qr/(\S+)\z/
Rate lookBehind groupCapture metaSlashK literalSlashK
lookBehind 185048/s -- -68% -69% -76%
groupCapture 580337/s 214% -- -4% -25%
metaSlashK 606128/s 228% 4% -- -21%
literalSlashK 769629/s 316% 33% 27% --
Thanks to all who contributed to this discussion.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Practical use cases for \K in regex
by kcott (Archbishop) on Oct 05, 2023 at 16:27 UTC | |
|
Re: Practical use cases for \K in regex
by hippo (Archbishop) on Oct 05, 2023 at 15:33 UTC | |
|
Re: Practical use cases for \K in regex
by jo37 (Curate) on Oct 05, 2023 at 17:08 UTC | |
|
Re: Practical use cases for \K in regex
by ikegami (Patriarch) on Oct 05, 2023 at 20:07 UTC |