comment on

I agree with santonegro that a parser makes this a lot simpler. Let someone else do the heavy lifting. :-)

#!/bin/perl5

use strict;
use warnings;
use HTML::TokeParser::Simple;

my $html;
{
  local $/;
  $html = <DATA>;
}

my $tp = HTML::TokeParser::Simple->new(\$html)
  or die "Couldn't parse string: $!";

while (my $t = $tp->get_token) {
    
  if (
    $t->is_start_tag('a') and 
    $t->get_attr('href') =~ /^http/ and not
    $t->get_attr('target')
  )
  {
    $t->set_attr('target', '_blank');
  }
  print $t->as_is;
}

__DATA__
<a href="http://here.com" target="_blank">here</a>
<a href="http://there.com">there</a>
<a href="http://everywhere.com" target="foo">everywhere</a>
<a href="local.html">local</a>
[download]

output:

---------- Capture Output ----------
> "C:\Perl\bin\perl.exe" parse_.pl
<a href="http://here.com" target="_blank">here</a>
<a href="http://there.com" target="_blank">there</a>
<a href="http://everywhere.com" target="foo">everywhere</a>
<a href="local.html">local</a>

> Terminated with exit code 0.
[download]

Update: Covered the "unless they already have a target" condition as pointed out by ikegami below. I would argue that fixing/changing this script is easier than fixing a regex (but I would say that wouldn't I :-))

In reply to Re: Regexp: Match anything except a certain word by wfsp
in thread Regexp: Match anything except a certain word by fraktalisman

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks