Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am able to extract links from HTML using the commonly found formats and modules, but am having difficlty using these methods to parse out all the URLs from a plain text file *without* the HTML formatting. Does anyone have a small snippet of code that can grab all the URLs from a plain text file

Replies are listed 'Best First'.
Re: Extract URL from text
by Beatnik (Parson) on Jan 03, 2002 at 14:05 UTC
    With URI::Find...
    use URI::Find; my $text = "alot of HTML"; find_uris($text, sub { my($uri, $orig_uri) = @_; print $orig_uri,"\n"; return $orig_uri; });
    Ofcourse there are always HTML::LinkExtor and HTML::SimpleLinkExtor...
    use HTML::SimpleLinkExtor; my $extor = HTML::SimpleLinkExtor->new(); $extor->parse_file($filename); # ---- or ----- $extor->parse($html); #extract all of the links @all_links = $extor->links;
    or
    require HTML::LinkExtor; $p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/"); sub cb { my($tag, %links) = @_; print "$tag @{[%links]}\n"; } $p->parse_file("index.html");

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.