argv has asked for the wisdom of the Perl Monks concerning the following question:

I want to iterate through html code and replace all references to hrefs of OFFSITE code with a "target", while keeping other hrefs (that remain within the site) alone. So, consider these two html snippets:

<a href="http://www.foobar.com/xxx"> <a href="http://www.danheller.com/xxx">

I want to rewrite this text accordingly:

<a href="http://www.foobar.com/xxx" target="_blank"> <a href="http://www.danheller.com/xxx">

I tried using something like this:

while ($body_str =~ s|href="http://(.*)"| ($1 =~ /^www.danheller.com/) ? $1 : qq($1" target="_blank)|eg) { print "Found '$1'\n"; }

Needless to say, it doesn't work. I keep running into a circular dependency on matching the code and then losing my place in the text for a replacement, or replacing more occurances than intended. just how far off am I? :|

Replies are listed 'Best First'.
Re: replacing chunks of text only if another chunk matches a pattern
by ccn (Vicar) on Oct 09, 2004 at 19:42 UTC

    $_ = <<'HTML'; <a href="http://www.foobar.com/xxx"> <a href="http://www.danheller.com/xxx"> <a href="http://www.foobar.com/xxx"> <a href="http://www.danheller.com/xxx"> HTML s{(href="http://(?!www\.danheller\.com)[^"]+")}{$1 target="_blank"}g; print;

    Results:

    <a href="http://www.foobar.com/xxx" target="_blank"> <a href="http://www.danheller.com/xxx"> <a href="http://www.foobar.com/xxx" target="_blank"> <a href="http://www.danheller.com/xxx">
Re: replacing chunks of text only if another chunk matches a pattern
by TedPride (Priest) on Oct 09, 2004 at 19:55 UTC
    while ($body_str =~ /"http:\/\/([^ \/"]+)[^ "]*"/g) { if ($1 ne 'www.danheller.com') { $pos = pos($body_str); substr($body_str, $pos - 1, 1) = '" target="blank"'; pos($body_str) = $pos; } }
    The regex might need a bit of upgrading, but it should work fine if your pages don't have sloppy HTML syntax (leaving out one or both ").