bobf has asked for the wisdom of the Perl Monks concerning the following question:

I have a string that begins with a variable sequence of characters, which is followed by a constant string that can be used as a unique anchor. I want to strip off the variable stuff at the beginning so that it begins with the anchor. For example, given this:

my $s = 'variable chars anchor want this';
I want to end with this:
$s = 'anchor want this';

I can think of several ways to approach this, but I do not know which might be considered better than others from the standpoint of style or efficiency (not just benchmarking, but also including potential for backtracking). These approaches include:

All of these employ .* (except for the silly constructs, which are left as an exercise for the reader), but I am trying to avoid that idiom by thinking more carefully about my regexen (Death to Dot Star!). I suspect a better regex might be something like a negated character class, but for a group: "match everything from the start of the string that does not match the sequence of characters anchor".

How would you do this, and why?

Replies are listed 'Best First'.
Re: Regex style and efficiency
by BrowserUk (Patriarch) on Jan 10, 2010 at 22:13 UTC

    Because it is simple and efficient:

    $s = 'variable chars anchor want this';; $s = substr $s, index $s,'anchor';; print $s;; anchor want this

    If I was going to use a regex (say the anchor had a variable component), then:

    $s = 'variable chars anchor want this';; $s =~ s[.+(?=anchor)][];; print $s;; anchor want this

    The difference:

    cmpthese -1,{ a=> q[$s='variable chars anchor want this';$s=~s[.+(?=anchor)][];] +, b=> q[$s='variable chars anchor want this';$s=substr $s, index $s, +'anchor';] };; Rate a b a 1592446/s -- -48% b 3055291/s 92% --

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      As is, the substr/index version assumes the anchor is present. That may or may not be a problem.

      Nice. It didn't even occur to me to use substr/index for this. I also like how you used the non-capturing look-ahead to prevent the desired text from being included in the s///. Thanks!

Re: Regex style and efficiency
by JavaFan (Canon) on Jan 11, 2010 at 10:02 UTC
    Here's one that doesn't use .*:
    $s =~ /anchor/p and $s = ${^MATCH} . ${^POSTMATCH};