in reply to Help with regs
Apart from that, the first parenthesized portion looks a bit odd, and the basic problem is that it doesn't really guard against hitting on a URL that happens to be inside of (i.e. an attribute of) some other tag. Something like the following might be an improvement (but HTML::TokeParser, or TokeParser::Simple, is still the preferred approach):
Note the use of curly braces to bound the left and right sides of the expression -- so we don't have to backslash-escape all the slashes in the pattern content (you forgot to add the backslash for the </a> part in your code, so it should have caused a syntax error).s{(>[^<]*?)(http://([.\w/]+))}{$1<a href=$2>$3</a>}gi;
In this version, the first part assumes that once you see a close angle bracket, you're not inside any sort of tag, so look for zero or more characters that are not an open bracket, followed by a URL.
(update: fixed a couple typos in the explanation.)
|
|---|