in reply to detecting URLs and turning them into links

frodo72's suggestion is short, simple and probably good enough. However, if you insist URLs should follow whitespace, this will work better:
$message =~ s#(?<=\s)(http://\S+)#<a href="$1">$1</a>#g;
(using lookbehind assertions, see perldoc perlre). This, however, will not replace the URL that starts in the very first character in $message (since it's not following no whitespace). We can replace that separately:
$message =~ s#^(http://\S+)#<a href="$1">$1</a>#; $message =~ s#(?<=\s)(http://\S+)#<a href="$1">$1</a>#g;
URLs are, however, often followed by a punctuation sign, which unfortunately will be included in our link (as in "http://google.com, for example…"). To remove this effect, make URLs end with a letter:
$message =~ s#^(http://\S+[a-z])#<a href="$1">$1</a>#; $message =~ s#(?<=\s)(http://\S+[a-z])#<a href="$1">$1</a>#g;
I could go on talking about funny characters and special cases, but I'll just stop here and suggest you use Regexp::Common:
use Regexp::Common qw( URI ); $message =~ s#^($RE{URI}{HTTP})#<a href="$1">$1</a>#; $message =~ s#(?<=\s)($RE{URI}{HTTP})#<a href="$1">$1</a>#g;

Replies are listed 'Best First'.
Re^2: detecting URLs and turning them into links
by polettix (Vicar) on Aug 24, 2007 at 00:39 UTC
    Regexp::Common won't help much for your purpose here:
    $ perl -le ' use Regexp::Common qw(URI); $_ = "http://www.example.com/ciao,"; s/$RE{URI}{HTTP}/doh!/; print ' doh!
    I elaborated a bit about it here.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      In practice Regexp::Common works very well for me.

      I maintain the #perl6 irc logs and URLs are automatically linkified using this regex:

      qr/\b$RE{URI}{HTTP}(?:#[\w_%:-]+)?\b/

      and it works fairly well. I guess about 99% or URLs are handled correctly, and one half of the 1% failures are due to my rather naive handling of anchors.

      Though I have to admit that "my" chatters are mostly geeks who paste URLs with leading http:// (and I ignore other URLs).

        Regexp::Common is a wonderful module, but I had the impression that akho was suggesting it as the solution to leave out unwanted trailing characters, like in the example in akho's post:
        use Regexp::Common qw( URI ); $message =~ s#^($RE{URI}{HTTP})#<a href="$1">$1</a>#; $message =~ s#(?<=\s)($RE{URI}{HTTP})#<a href="$1">$1</a>#g;
        My point is only that Regexp::Common does not help with this issue here, because it does not eliminate the trailing punctuation chars (like yours, see today's logs). And it couldn't be differently, because those punctuation chars are allowed in a HTTP URI, so $RE{URI}{HTTP} can't help getting it.

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.