in reply to How to remove everything after last occurrence of a string?

There are many possible approaches to this. One is to quotemeta() only inside the pattern:

$version = "3.5.2+tcl-tk-8.5.18+sqlite-3.10.0"; $path =~ s/\Q$version\E.*/$version/s;

Another is to replace with what the pattern actually matched:

$version = quotemeta("3.5.2+tcl-tk-8.5.18+sqlite-3.10.0"); $path =~ s/($version).*/$1/s;

Another is to use index() to search for a fixed string, rather than a regexp:

$version = "3.5.2+tcl-tk-8.5.18+sqlite-3.10.0"; my $pos = index($path, $version); # cut everything beyond the match, if there was a match substr($path, $pos + length($version)) = '' if $pos >= 0;

My guess is that tybalt89's solution with \K will be the fastest, but not necessarily the easiest to understand and maintain.

Replies are listed 'Best First'.
Re^2: How to remove everything after last occurrence of a string?
by Marshall (Canon) on Jun 06, 2022 at 22:39 UTC
    Your index() solution does not do exactly what the regex solution does. That may or may not make a difference in this application, but I think worth pointing out. The regex is "greedy" and will wind up matching the last occurrence of $version. Your index() code will find only the first occurrence of $version.

    Try tybalt's code with:

    $path = "/some/path/3.5.2+tcl-tk-8.5.18+sqlite-3.10.0/a/b/3.5.2+tcl-tk +-8.5.18+sqlite-3.10.0/c"; # and you will see that only the last "/c" is deleted. /some/path/3.5.2+tcl-tk-8.5.18+sqlite-3.10.0/a/b/3.5.2+tcl-tk-8.5.18+s +qlite-3.10.0
    As far as maintainability and understandability goes, I would not use the \K and prefer the more common way:
    instead of : $path =~ s/.*\Q$version\E\K.*//s; I probably would have coded: $path =~ s/(.*\Q$version\E).*/$1/s;
    I think \K is specific to Perl or at least I know that it does not exist in some other regex dialects that I use.

    From the use case presented, the speed of execution doesn't matter at all. I would opt for simplicity and avoid uncommon things like \K. I could code a faster, better version of your index() approach in ASM and it would run like a "super rocket" but to absolutely no effect whatsoever upon total program execution time. And I think this could miss use cases involving wide characters which the regex will handle as part of Perl (the one byte per character assumption although extremely useful for many things, it does have some limitations).

    I don't know why the /s regex modifier is used and the rationale behind that could be a bit obscure? Normally "." matches anything except \n. /s allows "." to include the "\n". I would not expect to see an \n in a path name. I'm not sure that this makes any difference at all, but again, some of these small things can matter depending upon the circumstances.

      Your index() solution does not do exactly what the regex solution does.

      This shortcoming can be addressed by the use of rindex:

      Win8 Strawberry 5.8.9.5 (32) Mon 06/06/2022 20:19:26 C:\@Work\Perl\monks >perl use strict; use warnings; my $path = "/some/3.5.2+tcl/path/3.5.2+tcl/a/b/c"; print "\$path before: '$path' \n"; my $version = "3.5.2+tcl"; my $pos = rindex($path, $version); # cut everything beyond the match, if there was a match substr($path, $pos + length($version)) = '' if $pos >= 0; print "\$path after: '$path' \n"; ^Z $path before: '/some/3.5.2+tcl/path/3.5.2+tcl/a/b/c' $path after: '/some/3.5.2+tcl/path/3.5.2+tcl'

      Personally, I have no objection to \K on the grounds of understandability, readability or maintainability; indeed, it seems desirable on these grounds. One must be aware, however, that it was only introduced with Perl version 5.10. That's over twenty years old now, but one still occasionally sees situations in which \K is not available. AFAIK, your regex approach and the rindex approach will work with any version of Perl 5.


      Give a man a fish:  <%-{-{-{-<

        I guess we disagree about \K. I would not use it unless it was actually needed for the functionality desired. Perl will always have its place (at least for a very long time). However, Perl is not being routinely taught at the undergrad level anymore and it could be that lot's of Perl code will wind up being maintained by folks who are not all that proficient at the language or the more obscure corners of its regex syntax.

      Your index() solution does not do exactly what the regex solution does.

      Thanks for bringing that up - I had intended to comment on it, but forgot. I think only tybalt89's regexp solution matches the last occurrence, due to its inclusion of a leading /.*/ in the pattern. The pattern in the original post and in my two regexp-based solutions will match the first occurrence, same as the index() solution.

        Glad we have a common understanding of what the code examples actually do.

        I was just looking at the node title: "How to remove everything after last occurrence of a string?". I suspect that is more restrictive than what the OP actually needs (i.e. I suspect that first or last doesn't matter).

        As far as \K goes, this to me, in this case, is some "Perl Candy". The same functionality is easily done without it. Many regex implementations do not have this, so in terms of "easy to understand", it flunks unless you are a Perl'er who uses this feature often.