Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: How to remove everything after last occurrence of a string?

by Marshall (Canon)
on Jun 06, 2022 at 22:39 UTC ( [id://11144464]=note: print w/replies, xml ) Need Help??


in reply to Re: How to remove everything after last occurrence of a string?
in thread How to remove everything after last occurrence of a string?

Your index() solution does not do exactly what the regex solution does. That may or may not make a difference in this application, but I think worth pointing out. The regex is "greedy" and will wind up matching the last occurrence of $version. Your index() code will find only the first occurrence of $version.

Try tybalt's code with:

$path = "/some/path/3.5.2+tcl-tk-8.5.18+sqlite-3.10.0/a/b/3.5.2+tcl-tk +-8.5.18+sqlite-3.10.0/c"; # and you will see that only the last "/c" is deleted. /some/path/3.5.2+tcl-tk-8.5.18+sqlite-3.10.0/a/b/3.5.2+tcl-tk-8.5.18+s +qlite-3.10.0
As far as maintainability and understandability goes, I would not use the \K and prefer the more common way:
instead of : $path =~ s/.*\Q$version\E\K.*//s; I probably would have coded: $path =~ s/(.*\Q$version\E).*/$1/s;
I think \K is specific to Perl or at least I know that it does not exist in some other regex dialects that I use.

From the use case presented, the speed of execution doesn't matter at all. I would opt for simplicity and avoid uncommon things like \K. I could code a faster, better version of your index() approach in ASM and it would run like a "super rocket" but to absolutely no effect whatsoever upon total program execution time. And I think this could miss use cases involving wide characters which the regex will handle as part of Perl (the one byte per character assumption although extremely useful for many things, it does have some limitations).

I don't know why the /s regex modifier is used and the rationale behind that could be a bit obscure? Normally "." matches anything except \n. /s allows "." to include the "\n". I would not expect to see an \n in a path name. I'm not sure that this makes any difference at all, but again, some of these small things can matter depending upon the circumstances.

Replies are listed 'Best First'.
Re^3: How to remove everything after last occurrence of a string?
by AnomalousMonk (Archbishop) on Jun 07, 2022 at 00:58 UTC
    Your index() solution does not do exactly what the regex solution does.

    This shortcoming can be addressed by the use of rindex:

    Win8 Strawberry 5.8.9.5 (32) Mon 06/06/2022 20:19:26 C:\@Work\Perl\monks >perl use strict; use warnings; my $path = "/some/3.5.2+tcl/path/3.5.2+tcl/a/b/c"; print "\$path before: '$path' \n"; my $version = "3.5.2+tcl"; my $pos = rindex($path, $version); # cut everything beyond the match, if there was a match substr($path, $pos + length($version)) = '' if $pos >= 0; print "\$path after: '$path' \n"; ^Z $path before: '/some/3.5.2+tcl/path/3.5.2+tcl/a/b/c' $path after: '/some/3.5.2+tcl/path/3.5.2+tcl'

    Personally, I have no objection to \K on the grounds of understandability, readability or maintainability; indeed, it seems desirable on these grounds. One must be aware, however, that it was only introduced with Perl version 5.10. That's over twenty years old now, but one still occasionally sees situations in which \K is not available. AFAIK, your regex approach and the rindex approach will work with any version of Perl 5.


    Give a man a fish:  <%-{-{-{-<

      I guess we disagree about \K. I would not use it unless it was actually needed for the functionality desired. Perl will always have its place (at least for a very long time). However, Perl is not being routinely taught at the undergrad level anymore and it could be that lot's of Perl code will wind up being maintained by folks who are not all that proficient at the language or the more obscure corners of its regex syntax.
Re^3: How to remove everything after last occurrence of a string?
by hv (Prior) on Jun 07, 2022 at 00:43 UTC

    Your index() solution does not do exactly what the regex solution does.

    Thanks for bringing that up - I had intended to comment on it, but forgot. I think only tybalt89's regexp solution matches the last occurrence, due to its inclusion of a leading /.*/ in the pattern. The pattern in the original post and in my two regexp-based solutions will match the first occurrence, same as the index() solution.

      Glad we have a common understanding of what the code examples actually do.

      I was just looking at the node title: "How to remove everything after last occurrence of a string?". I suspect that is more restrictive than what the OP actually needs (i.e. I suspect that first or last doesn't matter).

      As far as \K goes, this to me, in this case, is some "Perl Candy". The same functionality is easily done without it. Many regex implementations do not have this, so in terms of "easy to understand", it flunks unless you are a Perl'er who uses this feature often.

        As far as \K goes, this to me, in this case, is some "Perl Candy". ... Many regex implementations do not have this, so in terms of "easy to understand", it flunks unless you are a Perl'er who uses this feature often.

        The only other regex library I regularly use, other than Perl's, is the Boost library used in Notepad++ for regex search/replace. The Boost library documentation has mentioned \K since v1.40.0, which was released in 2009, so nearly 13 years ago. The oldest Notepad++ you can easily download from their website is v6.2.3 from 2012, and it allows \K -- so every Notepad++ download for the last decade has included this feature that you seem to think is "rare" and "Perl Candy".

        If you type \K into regex101.com, it will tell you in the explanation for both "PCRE2 (PHP>=7.3)" and "PCRE (PHP <7.3)" that it means it resets the match, though the other flavors there don't seem to know it. The site rexegg.com shows \K on both Quick Start: Cheat Sheet and their Best/Greatest Regex Trick Ever page

        The site regular-expressions.info has a whole page on "keep" and allows you to compare the functionality of \K between various regex engines on the Regular Expression Reference: Special Groups page. Based on selecting each of their options (other than Perl and Boost) from the dropdown, it looks like JGsoft has had it since V2, .NET doesn't, Java doesn't, PCRE since 7.2, PCRE2 with no limit, PHP since 5.2.4 (which PHP releases says was from 2007), Delphi has it, R has it, JavaScript does not, VBScript does not, VRegExp does not, Python does not, Ruby does since 2.0, std::regex does not, and the rest they list do not have that feature. So, while it's by no means all regex engines that have it, many do, including ones shipped with languages like PHP, Delphi, R, and Ruby. So I'd say \K is not just "Perl Candy" by any stretch of the imagination.

        Personally, \K is easier to understand to me than more complicated lookbehind syntax (which is available in many more of those regex flavors mentioned above), and has the power of being variable-width, which most of the normal lookbehinds (including Perl's) is not.

        You are, of course, allowed to not use it if you don't want. And I often default to recommending Notepad++ regex-newbies use a capture group and include the $1 rather than using \K because of Notepad++ quirks (where \K only works as expected with Replace All, not with multiple single Replace). And to warn others that it requires Perl v5.10 is sensible. But to avoid it just because it's "Perl Candy" is a hard-to-defend position, IMO.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144464]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-03-28 23:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found