Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure I'm missing something blindingly obvious but I'll ask anyway.

This is the Do What I Mean version of my code:

$s = ' <p>something something</p> <h2>blah blah blah</h2> <p>something something</p> '; $before = '<h2>([^<]+)</h2>'; $after = '<h1>$1</h1>'; $s =~ s/$before/$after/; print $s;
But it doesn't. DWIM that is.

I get

<p>something something</p> <h1>$1</h1> <p>something something</p>

Is there a way to do this?

$s =~ s/$before/eval{$after}/e;
doesn't work, and neither does
$regex = 's!<h2>([^<]+)</h2>!<h1>$1</h1>!g'; $s =~ $regex;


($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: Supplying the RHS to a regex as a variable
by BUU (Prior) on Oct 18, 2003 at 05:59 UTC
    $s = ' <p>something something</p> <h2>blah blah blah</h2> <p>something something</p> '; $before = '<h2>([^<]+)</h2>'; $after = '"<h1>$1</h1>"'; $s =~ s/$before/$after/ee; print $s;
    <p>something something</p> <h1>blah blah blah</h1> <p>something something</p> Tool completed successfully
      Picture me slapping my forehead. Thank you so much.

      I'd forgotten all about "ee" but now I'm remembering Merlyn's classic /old macdonald/eieio regex.



      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
      I can easily break this code by feeding it a valid html string:
      $s = ' <p>something something</p> <h2>blah <<< blah blah</h2> <p>something something</p> ';

      Update:

      BUU, although parsing is not the purpose, you have to realize that implicitly it is done any way. Regexp is obviously a kind of parsing, and it has been repeatedly mentioned by many monks here that, regexp is not a good way to deal with html.

      It is not an easy task to come up with perfect regexp for dealing with html.

      On the other hand, although the purpose is not parsing, parsing is still a valid tool, isn't it? In this case, actually a better tool.

        But he's not parsing html. The parent node never even mentioned parsing html. All he wants to do is match a certain substring and replace it with another one. The minor fact that this string happens to contain data that superficially resembles html has absolutely no bearing on this.
Re: Supplying the RHS to a regex as a variable
by pg (Canon) on Oct 18, 2003 at 07:03 UTC
    A better way is to use existing modules to parse html for you, for example HTML::Parser: (code is only used to demo the idea. Obviously you need to add more handling, but parsing is done for you correctly.)
    use HTML::Parser; use strict; my $str = '<p>something something</p> <h2>blah <<< blah blah</h2> <p>something something</p>'; my $parser = HTML::Parser->new(default_h=> [\&handler, "tagname, text" +]); $parser->parse($str); sub handler { my ($tag, $text) = @_; if ($tag) { if ($tag eq "h2") { print "<h1>"; } else { print "<$tag>"; } } else { print $text; } }
      I can easily break this code by feeding it:
      my $str = '<p>something something</p> <h2>blah </a> blah blah</h2> <p>something something</p>';

      Output:
      <p>something something<p> <h1>blah <a> blah blah<h1> <p>something something<p>
      (Whered the slash before the a go? Heck if I know)
      Thanks for the code. It really was just an example though. I was thinking about the Big Picture rather than the code at hand.



      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
Re: Supplying the RHS to a regex as a variable
by BUU (Prior) on Oct 18, 2003 at 07:57 UTC
    Speaking to your purpose and not your mechanics, after further experimentation I came up with this:
    use strict; my $s = '<p>something something</p> <h2>blah </a> blah blah </h2> <p>something something</p>'; my $start = '<h2>'; my $end = '</h2>'; my $first = index($s,$start); my $last = index($s,$end,$first); substr($s, $first, $last-$first+length $end) = '<h1>'.substr($s,$first ++length $start,$last-$first-length $start).'</h1>'; print $s;
    Which simply finds the first occurence of $start, then finds the next occurence after that of $end, and replaces that sub string with a new sub string. It's not quite as nice looking, but it doesn't break with random chars in between the start and end nodes. As it is now, it only replaces the first occurence of the start/end tags, but if you wanted to do it generally, you would just need to keep track of where ever the last $start or $end tag (your preference) was found and use that as the starting point for the next index.

    At the moment the only bug I see is nesting, which would require slightly more complicated code, probably something along the lines of finding the first occurence of $start, then finding the next occurence of $end and then checking if another $start appears between them, if so, skip that $end tag and move on. That would of course break the current ability to repeat $start as many times as you want and just end with a single $end tag, and so would require 'properly nested data', which can probably be considered a feature =]

    This was mostly 'off the cuff' code and by no means thoroughly tested, so if anyone else sees any major flaws I would be interested in hearing them.

    Hrm, perhaps something like:
    while(1) { if( index( substr( $s, $start_pos, $end_pos-$start_pos ), $start_t +ag ) != -1 ) { $end_pos = index( $s, $end_tag, ++$end_pos ); } else { last; # $end_pos is good } }
    For the nested loops, assuming the existience of certain vars to simply things. This isn't tested at all however, and should be treated mostly as pseduo code.
Re: Supplying the RHS to a regex as a variable
by delirium (Chaplain) on Oct 18, 2003 at 14:44 UTC
    How about a simple:

    s/<h2>/<h1>/gi; s#</h2>#</h1>#gi;

    You could also use a stylesheet to change the properties of H2 tags and not bother replacing any existing markup.