Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for an efficient regex to pull an IP address (if on exists) from a URL. I say efficient as it will be doing this for many URL's within a file.

Anyone got any ideas or seen this done somewhere else?

Thanks to all.

Replies are listed 'Best First'.
Re: Getting IP from URL
by zigdon (Deacon) on Oct 04, 2002 at 12:54 UTC

    when you say "if one exists" does that mean that from the url 'http://peeron.com/inv' you don't want to get anything? Or do you want to get 'peeron.com'? or maybe '66.216.39.12'?

    for the simplest case, you just want to get the ip from a url like 'http://66.216.39.12/inv'. A simple regex to do this would be:

    $url = "http://66.216.39.12/inv"; $url =~ m!https?://((?:\d{1,3}\.){3}\d{1,3})!i; $ip = $1;

    of course, this will also match things like "http://999.999.999.999/", but I'm not going to worry about that.

    For the more complex case, where you actually want the real ip, given the hostname, I'd use Net::DNS. Shamelessly copied from the Net::DNS manpage

    use Net::DNS; my $url = "http://peeron.com/inv"; $url =~ m!https?://([^/]+)!i; my $res = Net::DNS::Resolver->new; my $query = $res->search($1); if ($query) { foreach my $rr ($query->answer) { next unless $rr->type eq "A"; print $rr->address, "\n"; } } else { print "query failed: ", $res->errorstring, "\n"; }

    Again, the usual warnings - this assumes you have a valid URI, and not something that will buffer overflow your DNS server, for example.

    update: or look below at rob_au's reply, for the really clean solution.

    -- Dan

      Sorry I didn't explain it very well. If I have a URL such as http://www.somedomain.com/this_page.html I don't want to do anything. On the other hand, if I have a URL such as http://10.1.1.1/this_dir/this_page.html I wish to extract the IP address which in this case would be 10.1.1.1.

      Looks like your first solution just might do the trick!

      Thanks much!
        Hrmmm, in light of this clarification, I think something like the following may be best ...

        use Socket; use URI; my $url = 'http://www.mydomain.com'; my $uri = URI->new( $url ); my $ip_addr = gethostbyname( $uri->host ); $ip_addr = inet_ntoa( $ip_addr ); if ( $uri->host eq $ip_addr ) { #... Creatures evolve, code does stuff }

        The advantage that this method of employing the URI module over a simple regular expression is the correct handling of more complex URLs (which may incorporate username and password authentication details in the form of scheme://username:password@host:port/) and validation of IP addresses.

         

        perl -e 'print+unpack("N",pack("B32","00000000000000000000000111001001")),"\n"'

Re: Getting IP from URL
by ybiC (Prior) on Oct 04, 2002 at 13:12 UTC
    I may misunderstand your question, but...

    Given a textual URL (protocol://host.dom/dir/file), the first thing you'll need to do is extract the "host.dom" part from the entire URL.   A CPAN search on "url" turns up some hits on modules that may (or may not) help.   Next you'd need to resolve the host names to IP addresses using DNS.   A quick look through Networking Code in Code Catacombs finds (code) Resolve list of DNS names by yours truly that should do nicely.

    But if you want to match URL entries containing IP addresses instead of 'host.dom', then check out Regexp::Common::net by our very own Abigail-II.
        cheers,
        Don
        striving toward Perl Adept
        (it's pronounced "why-bick")
Re: Getting IP from URL
by rob_au (Abbot) on Oct 04, 2002 at 12:59 UTC
    I'm not quite sure what you're wanting to achieve but I think the following piece of code may help - This code takes a URL and attempts to resolve the IP address of the hostname portion of that URL ...

    use Socket; use URI; my $url = 'http://www.mydomain.com'; my $uri = URI->new( $url ); my $ip_addr = gethostbyname( $uri->host ); print $url, " -> ", inet_ntoa( $ip_addr ), "\n";

     

    perl -e 'print+unpack("N",pack("B32","00000000000000000000000111001000")),"\n"'