Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to make a regular expression that changes this :
1_abc/2_deg/bla_30_31_blah
into :
abc/deg/bla_30_31_blah
Is there a possibility to only substitue only the first two matches of \d_ to '' or is there another way to accomplish this?
  • Comment on changing only the first two matches of a regex

Replies are listed 'Best First'.
Re: changing only the first two matches of a regex
by merlyn (Sage) on Mar 20, 2001 at 20:02 UTC
    To answer your question literally, yes:
    { my $count = 0; s/(\d+_([a-z]+))/++$count > 2 ? $1 : $2/eg; }
    Here the $count variable is counting instances of matches. When it gets past two, the original string is replaced with itself, otherwise the desired change is made. The /e modifier says that the right side is Perl code to be executed rather than simply a double-quoted string.

    -- Randal L. Schwartz, Perl hacker

Re: changing only the first two matches of a regex
by ColonelPanic (Friar) on Mar 20, 2001 at 19:11 UTC
    Use a regex that encompasses the entire first two:
    This is an untested regex. It also assumes that the first two are always in the exact format from above. You may need to change the specifics, but this is the general idea
    s/^\d_(\w+)\/\d_(\w+)/$1\/$2/;


    When's the last time you used duct tape on a duct? --Larry Wall
Re: changing only the first two matches of a regex
by arturo (Vicar) on Mar 20, 2001 at 19:14 UTC

    Hint: use anchors.

    With your example, it appears you want to get rid of the first two digits and their following underscores that appear in the string. A regex that does that is:

    $string =~ s/^\d_(\D+?)\d_(.*)/$1$2/;

    the caret "^" at the beginning of this regex anchors things to the beginning of the string. You then make judicious use of captures to get the desired result.

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

      Actually this will not work if the string includes numbers in the part that you match with \D. I think you have to "unroll the loop" if you want to avoid (.*?):

      #!/bin/perl -w use strict; foreach( <DATA>) { s{^\d+_ # start then number(s) then _ ((?:\D*(?:\d(?!\d*_))?)*) # non_digits then optionnaly # a digit _not_ followed by dig +its and _ # repeat, rince... \d+_ # the second digit _ sequence (.*)} # the rest of the string {$1$2}x; print; } __DATA__ 1_abc/2_deg/bla_30_31_blah 1_ab1c/2_deg/bla_30_31_blah 1_ab1c/22_deg/bla_30_31_blah 1_a2b33c/22_deg/bla_30_31_blah a_ab1c/b_deg/bla_30_31_blah

      Note that your regexp will not work properly for the second string. I also added a line where the regexp does not match as this can show bugs when a badly written regexp takes for ever to match on a string that it does not match.

      Frankly it this case, and despite all the warnings around, I would use (.*?) and write the regexp as:

      s/^\d_(.*?)\d_(.*)/$1$2/;

      I would be happy to get enlightened as to why this is not safe by the way.

Re: changing only the first two matches of a regex
by knight (Friar) on Mar 20, 2001 at 19:48 UTC
    If the problem is really that you only want to do this to the first two fields, the previous replies cover that.

    But if you mentioned "first two matches" in reference to the specific sample data you show, and the underlying general problem is that you want to remove \d_ from the beginning of any and all /-separated fields in a string, then try:
    s#(^|/)\d_#$1#g;
    The (^|/) alternation matches at the beginning of the line or after your field separator, which avoids inadvertently matching and remove \d_ from the middle of a field.
Re: changing only the first two matches of a regex
by jeroenes (Priest) on Mar 20, 2001 at 19:39 UTC
    In the quest for regex-less solutions:
    $in='1_abc/2_deg/bla_30_31_blah'; #one-liner with split, nicely complicated (including backdoor regex) $out=join '/', (split /_|\//, $in, 4)[1,3]; #This works if the second number is a single digit $out=substr $in, index( $in, '_') + 1; substr( $out, index( $out, '_') - 1, 2)='';
    At the moment, I can't think of more.

    Cheers,

    Jeroen
    "We are not alone"(FZ)
    Update: Just thought of one substr use that doesn't break on multiple digits...:

    $out=substr $in, index( $in, '_') + 1; substr( $out, $us = index( $out, '/') + 1, index( $out, '_')- $us + 1) +='';
Re: changing only the first two matches of a regex
by Jouke (Curate) on Mar 20, 2001 at 19:10 UTC
    I would do something like:
    s/(\d+_)(.*)?(\d+_)(.*)/$1$2/;
    But I'm certainly not a regex-wizard...

    Jouke Visser, Perl 'Adept'
      With what you have above, the replace should be $2$4, since you have 4 grouping operators there. Or you could just have "\d+_", or use "(?:\d+_)" as to use $1$2 as others have done later in this thread.
      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
(tye)Re: changing only the first two matches of a regex
by tye (Sage) on Mar 20, 2001 at 23:16 UTC

    Here is a fun (and powerful) way to only do some substitutions:

    my $str= "1_abc/2_def/bla_30_31_blah"; my $count= 0; while( $str =~ /\d+_/g ) { last if 2 < ++$count; $str =~ s///; } print "\$str=($str)\n"; # $str=(abc/def/bla_30_31_blah)
    Though I'd probably use one of the other solutions mentioned in this particular case.

            - tye (but my friends call me "Tye")
      my $str= "1_abc/2_def/bla_30_31_blah"; my $count= 0; while( $str =~ /\d+_/g ) { last if 2 < ++$count; $str =~ s///; } print "\$str=($str)\n";
      I believe that middle s/// is not doing what you think it is. It's going back to the beginning of the string, looking for the first match that matches the most recent successful regex match, and then removing it.

      I think you lucked out on this one. Had your replacement been s//49_/, you would have seen only the first one changed, since the first replacement string would have been matched again on the second while-pass.

      Oh, and even more so... I don't think your //g in the while is actually inching along! I think it's scanning from the beginning again as well, since you modify the scalar within the loop. So you just did this the hard way:

      s/\d+_// for 1..2;
      But that's not exactly what was asked for originally. It just happens to be good for this problem and this input data.

      Well, it's what was literally asked for, but it's not very generalizable. {grin}

      -- Randal L. Schwartz, Perl hacker

        Thanks. I was surprised when that appeared worked.

        Well, it would have been powerful... (:

                - tye (but my friends call me "Tye")
Re: changing only the first two matches of a regex
by larryl (Monk) on Mar 22, 2001 at 02:37 UTC

    Instead of removing the first two occurrences, why not just remove the first occurrence, twice?

    s/(^|/)\d_//; s/(^|/)\d_//;