JayBee has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying several ways in manipulate a string to replace everything in between two sets of strings and so far no luck. I've even used the /s and /m modifiers since the Deitel book was talking about multiline and single line figuring that the "." character would match everything but newline, which I will/do have.

I'm not sure if the $last string matches right after the the $first string, but let's just say that it does and there's only one in the entire $string for now, but additional ways to do this would be a bonus for future use.

Thank you in advance for all your help.

my $string='foo bar foo bar <span id="box1" class="box1"> foo bar foo bar </span> foo bar'; $first='<span id="box1'; $last='/span>'; $reserve='<!-- BOX1 -->'; $string =~ s/$first(.*?)$last/$reserve/;
So just to be clear, how can I get the substitution operator to replace everything in my tag with a another tag such as a comment?

Replies are listed 'Best First'.
Re: Replacing everything in between using s///;
by merlyn (Sage) on May 09, 2005 at 01:20 UTC
Re: Replacing everything in between using s///;
by tlm (Prior) on May 09, 2005 at 01:28 UTC

    Well, if you add /s to what you have, everything between and including $first and $last will be replaced. You say this didn't work for you?

    Here's a useful mnemonic to remember the difference between /s and /m:

    /s affects the meaning of a single regexp character (namely "."), while /m affects the meaning of multiple regexp characters (actually, just two: "^" and "$").
    It's just a mnemonic, not the full story, but it comes in handy. With /m in effect, "^" matches not only the beginning of the string, but also the beginning of any line of text contained in the string (i.e. right after any embedded newlines). Likewise, with /m, $ will match the end of every line contained in the string, i.e. just before every newline contained in the string (though if the last character of the string is not a newline, then "$" matches the very end of the string). On the other hand, /s only makes "." match newlines too.

    BTW, the parentheses in your regexp are superfluous, since you are not doing anything with the captured string, and in this case they are not needed for grouping either.

    But despite my giving you these pointers on your regexp, I agree with merlyn that regexps are not the tool for this kind of problem.

    the lowliest monk

      I just realilzed I had a typo, which is why it didn't work, Duh!

      Just in case you're curious, from s/$first(.*?)$last/$reserve/s;
      I took out "(.*?)$last" and tried it, worked in a funny way.
      so I tried it without the "$first(.*?)" instead, and it failed completely.

      So looking at my actual code, I found a space between the used example: "/ span>" for $last.

      So the little things will get ya if you are not getting enough sleep :)

      Thank you very much for building my confidence with the way I was sure to be right with previous help from all you perl monks.

Re: Replacing everything in between using s///;
by ikegami (Patriarch) on May 09, 2005 at 03:08 UTC

    First, you'll escape all special characters in $first and $last, or use \Q...\E. (I can't believe noone else mentioned this.)

    Secondly, if you match $first and $last, you need to reinsert them:

    $string =~ s/(\Q$first\E).*?(\Q$last\E)/$1$reserve$2/;

    or use a zero-width assertion:

    $string =~ s/(?<=\Q$first\E).*?(?=\Q$last\E)/$reserve/;

    Since you're constant strings, you could also use index instead of regexp:

    $first_start = index($_, $first); if ($first_start >= 0) { $first_end = $first_start + length($first); $last_start = index($_, $last, $first_end); if ($last_start >= 0) { substr($_, $first_end, $last_start, $replace); } }
Re: Replacing everything in between using s///;
by eibwen (Friar) on May 09, 2005 at 01:30 UTC

    If you want to replace a substring bounded by other substrings:

    my $alpha = join '', ('a'..'z'); # $alpha = 'abcdefghijklmnopqrstuvwx +yz' $alpha =~ s/(?<=def).*(?=uvw)//; # remove everything preceeded by the + substring 'def' # and succeeded by the substring ' +uvw' print $alpha; # $alpha = 'abcdefuvwxyz'

    Consult perlre and perlretut for more on the formation of regular expressions; however, given your example using HTML, I presume you may really be looking for HTML::Template and similar modules.

Re: Replacing everything in between using s///;
by TedPride (Priest) on May 09, 2005 at 03:45 UTC
    An s flag tells the regex to ignore line boundaries, which it won't do normally. The following works fine:
    $string =~ s/$first(.*?)$last/$reserve/s;
    And since you aren't actually using the part in the middle, you don't need the parentheses, though the code works fine with or without them:
    $string =~ s/$first.*?$last/$reserve/s;

      An s flag tells the regex to ignore line boundaries, which it won't do normally.

      I wouldn't put it that way. /s makes newlines a valid match for ".". Moreover, perl normally does ignore internal line boundaries unless one tells it not to with /m. E.g.

      my $s = "a\nb\nc\n"; my @g = map "[$_]", $s =~ /(.)$/g; my @gs = map "[$_]", $s =~ /(.)$/gs; my @gm = map "[$_]", $s =~ /(.)$/gm; my @gsm = map "[$_]", $s =~ /(.)$/gsm; print "g: @g\n"; print "gs: @gs\n"; print "gm: @gm\n"; print "gsm: @gsm\n"; __END__ g: [c] gs: [c] [ ] gm: [a] [b] [c] gsm: [a] [b] [c] [ ]
      Note in particular the case in which both /s and /m are used.

      the lowliest monk