in reply to Re: detecting URLs and turning them into links
in thread detecting URLs and turning them into links

Regexp::Common won't help much for your purpose here:
$ perl -le ' use Regexp::Common qw(URI); $_ = "http://www.example.com/ciao,"; s/$RE{URI}{HTTP}/doh!/; print ' doh!
I elaborated a bit about it here.

Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf

Don't fool yourself.

Replies are listed 'Best First'.
Re^3: detecting URLs and turning them into links
by moritz (Cardinal) on Aug 24, 2007 at 07:56 UTC
    In practice Regexp::Common works very well for me.

    I maintain the #perl6 irc logs and URLs are automatically linkified using this regex:

    qr/\b$RE{URI}{HTTP}(?:#[\w_%:-]+)?\b/

    and it works fairly well. I guess about 99% or URLs are handled correctly, and one half of the 1% failures are due to my rather naive handling of anchors.

    Though I have to admit that "my" chatters are mostly geeks who paste URLs with leading http:// (and I ignore other URLs).

      Regexp::Common is a wonderful module, but I had the impression that akho was suggesting it as the solution to leave out unwanted trailing characters, like in the example in akho's post:
      use Regexp::Common qw( URI ); $message =~ s#^($RE{URI}{HTTP})#<a href="$1">$1</a>#; $message =~ s#(?<=\s)($RE{URI}{HTTP})#<a href="$1">$1</a>#g;
      My point is only that Regexp::Common does not help with this issue here, because it does not eliminate the trailing punctuation chars (like yours, see today's logs). And it couldn't be differently, because those punctuation chars are allowed in a HTTP URI, so $RE{URI}{HTTP} can't help getting it.

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
        You are absolutely right about the punctuation. In my use case it just appears virtually never (unless you test it ;-)

        I might consider using a negative lookbehind for punctuation, like so:

        use strict; use warnings; use Regexp::Common qw/URI/; my $text = 'http://perl-6.de/,'; if ($text =~ m/($RE{URI}{HTTP}(?<![.,]))/){ print "Match: $1\n"; } __END__ # Output: Match: http://perl-6.de/

        I think I'll add this to my CGI scripts as soon as I have write access to the repository again.