If you know that the incoming text is html data, then there
is probably a good way to us HTML::TokeParser::Simple so that
you can locate just the pieces in the data that represent
usable URL's that happen to be part of the visible text of
the page. This node shows an example of how it's
used for a similar sort of editing task.
Apart from that, the first parenthesized portion looks a
bit odd, and the basic problem is that it doesn't really
guard against hitting on a URL that happens to be inside of
(i.e. an attribute
of) some other tag. Something like the following might be an
improvement (but HTML::TokeParser, or TokeParser::Simple,
is still the preferred approach):
s{(>[^<]*?)(http://([.\w/]+))}{$1<a href=$2>$3</a>}gi;
Note the use of curly braces to bound the left and right
sides of the expression -- so we don't have to backslash-escape
all the slashes in the pattern content (you forgot to add
the backslash for the </a> part in your code, so it should
have caused a syntax error).
In this version, the first part assumes that once you see
a close angle bracket, you're not inside any sort of tag,
so look for zero or more characters that are not an open
bracket, followed by a URL.
(update: fixed a couple typos in the explanation.) |