Re: Help with regs

If you know that the incoming text is html data, then there is probably a good way to us HTML::TokeParser::Simple so that you can locate just the pieces in the data that represent usable URL's that happen to be part of the visible text of the page. This node shows an example of how it's used for a similar sort of editing task.

Apart from that, the first parenthesized portion looks a bit odd, and the basic problem is that it doesn't really guard against hitting on a URL that happens to be inside of (i.e. an attribute of) some other tag. Something like the following might be an improvement (but HTML::TokeParser, or TokeParser::Simple, is still the preferred approach):

s{(>[^<]*?)(http://([.\w/]+))}{$1<a href=$2>$3</a>}gi;
[download]

Note the use of curly braces to bound the left and right sides of the expression -- so we don't have to backslash-escape all the slashes in the pattern content (you forgot to add the backslash for the </a> part in your code, so it should have caused a syntax error).

In this version, the first part assumes that once you see a close angle bracket, you're not inside any sort of tag, so look for zero or more characters that are not an open bracket, followed by a URL.

(update: fixed a couple typos in the explanation.)

Comment on Re: Help with regs Download Code