Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm using the following regular expression (borrowed from the Perl Cookbook) to extract links from html:
@links = m/<A[^>]+?HREF\s*=\s*["']?([^'" >]+?)[ '"]?>/sig;
I am wondering if anyone has a suggestion as to how I would extract a list of links from a block of text without any html tags, i.e
"this is a link, http://bob.com/page"
and
"so is this, but with a query string and a trailing period, http://bob.com/page?elem=val&elem2=val2.".

I'm reading through the perlre documentation, but I'm not very good yet, so any suggestions would be helpful.
Thanks!

Replies are listed 'Best First'.
Re: extracting links from text
by damian1301 (Curate) on Sep 04, 2001 at 21:59 UTC
    You might want to check out URI::Find and HTML::LinkExtor

    _______________________________________________________
    s&&q+\+blah+&oe&&s<hlab>}map{s}&&q+}}+;;s;\+;;g;y;ahlb;veaD;&&print;
Re: extracting links from text
by mexnix (Pilgrim) on Sep 04, 2001 at 21:59 UTC
      These all seem useful if I'm looking to parse links out of HTML, but I haven't figured out how to do this in a text document without any HTML tags using these modules.

      Hmmmm. Maybe I've been eating too many retard sandwiches, and should try a diet of "Learning Perl".
Re: extracting links from text
by Cine (Friar) on Sep 04, 2001 at 22:06 UTC
    Gnome-terminal (which I know do this) uses the following scheme:

    If something starts with www or http:// it is recognised as a link up until the next illegal (in a link) char plus newlines. However if the last char is a . or , it is ignored.
    in perl language somthing like:
    @links = m/((?:www|http:\/\/)[anything allowed in a link \n]*)/ig; for (0..@"#links) { $links[$_] =~ s/.$//; $links[$_] =~ tr/\n//d; #Remove inline newlines. }


    T I M T O W T D I