Extract URL from text

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am able to extract links from HTML using the commonly found formats and modules, but am having difficlty using these methods to parse out all the URLs from a plain text file *without* the HTML formatting. Does anyone have a small snippet of code that can grab all the URLs from a plain text file

Comment on Extract URL from text

Replies are listed 'Best First'.
Re: Extract URL from text by Beatnik (Parson) on Jan 03, 2002 at 14:05 UTC
With URI::Find... `use URI::Find; my $text = "alot of HTML"; find_uris($text, sub { my($uri, $orig_uri) = @_; print $orig_uri,"\n"; return $orig_uri; });` [download] Ofcourse there are always HTML::LinkExtor and HTML::SimpleLinkExtor... `use HTML::SimpleLinkExtor; my $extor = HTML::SimpleLinkExtor->new(); $extor->parse_file($filename); # ---- or ----- $extor->parse($html); #extract all of the links @all_links = $extor->links;` [download] or `require HTML::LinkExtor; $p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/"); sub cb { my($tag, %links) = @_; print "$tag @{[%links]}\n"; } $p->parse_file("index.html");` [download] Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Extract URL from text
by Beatnik (Parson) on Jan 03, 2002 at 14:05 UTC

URI::Find

use URI::Find;

my $text = "alot of HTML";
find_uris($text,
sub {
 my($uri, $orig_uri) = @_;
 print $orig_uri,"\n";
 return $orig_uri;
});
[download]

HTML::LinkExtor

HTML::SimpleLinkExtor

use HTML::SimpleLinkExtor;

my $extor = HTML::SimpleLinkExtor->new();

$extor->parse_file($filename);
# ---- or -----
$extor->parse($html);
#extract all of the links
@all_links   = $extor->links;
[download]

require HTML::LinkExtor;
 $p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/");
 sub cb {
     my($tag, %links) = @_;
     print "$tag @{[%links]}\n";
 }
 $p->parse_file("index.html");
[download]

[reply]
[d/l]
[select]