Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

finding urls in a string

by Anonymous Monk
on Apr 07, 2004 at 15:47 UTC ( [id://343310]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I am seeking your wisdom again. I just would like to know the preferred method for pulling urls out of a string (just a load of regular text with url's peppered around in it - not html formatted). Would a regex be able to handle this? or would URI::Find be a preferred module. If anybody has an example, that would help me greatly.

Thanks in advance,
Jonathan

Replies are listed 'Best First'.
Re: finding urls in a string
by esskar (Deacon) on Apr 07, 2004 at 16:09 UTC
    hi. my snippet does a little bit more but you can proably easy change it so it fullfills your needs
    #! /usr/bin/perl use strict; use warnings; use URI::Find::Schemeless; use HTML::Entities qw(encode_entities); my $text = q~ hallo dies ist keine.url dies ist aber eine: www.intertivity.com ftp.irgendwas.de/test/thisfile mailto:perl@intertivityNOSP4M.com od +er so yeah perl@intertivityNOSP4M.com http://www.intertivity.com/ ~; my $finder = URI::Find::Schemeless->new ( sub { my ($uri, $originalUri) = @_; return q/<a href="/ . encode_entities("$uri") . q/">/ . encod +e_entities($originalUri) . q~</a>~; } ); my $howManyFound = $finder->find(\$text); print "$howManyFound URIs found\n"; print "$text\n";
Re: finding urls in a string
by tinita (Parson) on Apr 07, 2004 at 16:09 UTC
    i'd recommend URI::Find. of course you can do it with a regex, but i wouldn't start to build one myself.
    for examples just look at the docs: URI::Find
    there's a section called Example, i'm sure you'll find what you need there...
    if you really want to do it yourself, view the source...
Re: finding urls in a string
by Vautrin (Hermit) on Apr 07, 2004 at 17:11 UTC

    Others have mentioned how to find URLs, but you may want to double check whether or not the URLs are actually valid, using a module like LWP::Simple or LWP::UserAgent For instance, for all urls @urls you find, you might do something like:

    # assuming you've already populated @urls # and done: use LWP::UserAgent; use strict; use warnings; # try this: my @old_urls = @urls; @urls = (); my $user_agent = LWP::UserAgent->new; while (@old_urls) { my $url = shift (@old_urls); my $response = $user_agent->get($url); if ($response->is_success) { push @urls, $url; # or, if you want to get more detailed: # push @urls, { # url => $url, # type => $response->content_type, # }; } }

    Want to support the EFF and FSF by buying cool stuff? Click here.
Re: finding urls in a string
by borisz (Canon) on Apr 07, 2004 at 16:57 UTC
    Not for every need, but perhaps this is what you search for. perl -MLWP::Simple -e '$x=get("http://perlmonks.org"); use Regexp::Common qw/URI/; print join "\n",$x =~ /($RE{URI}{HTTP})/g'
    Boris

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://343310]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-19 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found