dbooth has asked for the wisdom of the Perl Monks concerning the following question:

Given a target URI, how can I determine whether an HTTP GET of that URI would be making a request to the local machine?

Context: I have a mod_perl2 script that responds to HTTP requests. In the course of doing so, it sometimes needs to make an HTTP request to retrieve some data from a target URI, for which I am using WWW::Mechanize, though I'm sure I could use LWP::Simple or LWP instead. The problem is that, to avoid an infinite recursion of HTTP requests, I need to avoid making the HTTP request if the target URI would actually resolve to the current machine. This is to prevent users from accidentally shooting themselves in the foot. It is not intended as a security check.

The problem is not as simple as looking at the URI to see if the domain is "localhost", or its IP address is 127.0.0.1 (or 127.0.1.1 or 127.*), because: (a) the target URI might use a fully qualified domain name that resolves to an IP address on the current machine; and (b) a machine can have several IP addresses.

I would think that this would be a common problem, but I have been unable to find a straight-forward solution. I do not need a solution that is guaranteed to work in all possible cases, but it would be nice if it would catch most cases.

This post discusses the problem of trying to determine the local machine's IP address (or addresses, since it may have several). Maybe I could do that to determine the local machine's IP addresses, and then perhaps I could compare those IP addresses against the IP address in the target URI (or the IP address returned by gethostbyname of the URI's domain). Do I really need to do that? Are there problems with that approach? Is there a better way?

This post indicates that C# has a function HttpContext.Current.Request.IsLocal to do what I need, but I have been unable to find anything similar in perl.

Suggestions please?

UPDATE: Since I found no better solution I ended up implementing this as follows. After extracting the host name from the URI, which I did using the perl URI module, I used IO::Interface::Simple to get the list of local IP addresses, as suggested by @Anonymous_Monk:

    #! /usr/bin/perl -w
    
    use strict;
    
    use Socket;
    use IO::Interface::Simple;
    
    print "127.0.1.1  is local\n" if &IsLocalHost("127.0.1.1");
    print "google.com is local\n" if &IsLocalHost("google.com");
    exit 0;
    
    ################ IsLocalHost #################
    # Is the given host name, which may be either a domain name or
    # an IP address, hosted on this local host machine?
    # Results are cached in a hash for fast repeated lookup.
    sub IsLocalHost
    {
    my $host = shift || return 0;
    our %isLocal;	# Cache
    return $isLocal{$host} if exists($isLocal{$host});
    my $packedIp = gethostbyname($host);
    if (!$packedIp) {
    	$isLocal{$host} = 0;
    	return 0;
    	}
    my $ip = inet_ntoa($packedIp) || "";
    our %localIps;      # Another cache
    %localIps = map { ($_, 1) } &GetIps() if !%localIps;
    my $isLocal = $localIps{$ip} || $ip =~ m/^127\./ || 0;
    # TODO: Check for IPv6 loopback also.  See:
    # http://ipv6exchange.net/questions/16/what-is-the-loopback-127001-equivalent-ipv6-address
    $isLocal{$host} = $isLocal;
    return $isLocal;
    }
    
    ################ GetIps #################
    # Lookup IP addresses on this host.
    sub GetIps
    {
    my @interfaces = IO::Interface::Simple->interfaces;
    my @ips = grep {$_} map { $_->address } @interfaces;
    return @ips;
    }
  • Comment on How can I determine whether a URI, hostname or IP address is to the local host machine?
  • Select or Download Code

Replies are listed 'Best First'.
Re: How can I determine whether a URI, hostname or IP address is to the local host machine?
by tobyink (Canon) on Apr 29, 2014 at 08:03 UTC

    When your script makes an HTTP request, include a header like:

    User-Agent: MyScript

    In your web server configuration (e.g. Apache .htaccess or httpd.conf) configure the server to block all requests from that user agent.

    # This should work in .htaccess SetEnvIf User-Agent MyScript GoAway=1 Order allow,deny Allow from all Deny from env=GoAway

    Easy peasy.

    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name

      Good idea! However, it is actually a bit more complicated in my case, because my application actually represents several virtual "nodes" (not virtual hosts in the Apache sense) that are effectively making the HTTP requests. I only want to disallow requests from the same host and the same node. So somehow the web server configuration would have to pick up both my application name (MyScript in your example) and the node name from the User-Agent string, and deny access if the the node name matches the applicable part of the URI.

      The downside of this approach is that I don't really want to deny access if the URI is to the local host and same node. It would be much more user friendly if I could instead simply convert the URI to a local file access and avoid issuing the request at all.

Re: How can I determine whether a URI, hostname or IP address is to the local host machine?
by Anonymous Monk on Apr 28, 2014 at 22:52 UTC

    Given the existence of load balancers, you will also have to consider the possibility that the address will only sometimes be the local machine.

    Why not keep a hash containing the sites you have visited. If the site you're about to visit exists in the hash, avoid recursion by not going there.

      Thanks for the thought. A hashmap of already-visited sites might work, but I am doing this from mod_perl2, which uses multiple threads for handling HTTP requests. Mod_perl2 seems to magically make each thread hold its own set of variables, and I have not yet figured out how to have a hashmap that is shared across all threads. And if I did figure out how to have a shared hashmap I would then need use some kind of locking to ensure thread safety when modifying that hashmap. So although the idea is intuitively appealing, I am concerned that it may not be so easy to implement.
Re: How can I determine whether a URI, hostname or IP address is to the local host machine?
by Anonymous Monk on Apr 28, 2014 at 23:05 UTC

    So the infinite recursion can only happen if the script that makes the HTTP request is actually calling itself (maybe on a different host)? Could you give each script and/or host some kind of unique identifier that it places in any request it makes (HTTP header, cookie, GET parameter, something), and if a script sees its own ID it stops the recursion? Or a more general solution, some kind of "Time To Live" HTTP header that gets decremented each time your script forwards a request?

    In terms of determining the local machine's IP address(es), I've used IO::Interface::Simple a few times for that.

      Interesting thought. I don't think it would be wise to send a unique identifier in a cookie or query parameter, because it may be going to an arbitrary site, which might not like receiving an unexpected cookie or query parameter. But it could be sent in a custom HTTP header, since according to the HTTP 1.1 specification, unrecognized headers SHOULD be ignored. However, there would still be the issue of generating a unique identifier -- rand would probably be good enough for that -- and sharing it across all of the mod_perl2 threads that handle HTTP requests for my server, which sounds like it would require some kind of locking or mutex. This approach may be worth considering if I cannot find anything easier.

      But it seems to me that there must be some reliable way to do this at the OS level, since the OS obviously needs to know all of the IP addresses and ports on which it is listening.

        A machine can identify its own IP addresses. However, the mapping of DNS names both to and from IP addresses can actually be pretty complicated to get completely right. Multiple adapters per machine, virtual machines, Apache virtual hosts, load balancers, transparent proxies, multiple addresses per DNS entry, multiple DNS entries per address, ...

        The header only needs to be "unique" to your script so that it can identify its own requests, e.g. X-dbooth-Loop-Prevention: foo. Or you could set a custom User-Agent like tobyink suggested, except that you could check for the presence of this header in your script instead of letting the webserver do it. The only case where this scheme could go wrong is if there is a transparent proxy in the path which modifies or removes headers.

Re: How can I determine whether a URI, hostname or IP address is to the local host machine?
by Anonymous Monk on Apr 29, 2014 at 00:05 UTC
Re: How can I determine whether a URI, hostname or IP address is to the local host machine?
by mhearse (Chaplain) on Apr 29, 2014 at 00:10 UTC
    Some suggestions:

    Create a central database accessible by all httpd machines. select * from hostname_map where ip = ? This table would have to be manually populated.

    Use a job server such as gearman to do the URI requests. I've done it within cgi/mod_perl... it can be messy.

      In theory that may work, but unfortunately any solution that relies on manual configuration would defeat the purpose, because the whole point is to catch easy-to-make configuration errors that would otherwise cause infinite recursion of HTTP requests.
Re: How can I ... avoid recursion? (User-Agent)
by Anonymous Monk on Apr 29, 2014 at 08:13 UTC

    You ought to disallow robot access to your service.

    Your service itself is a robot. Problem Solved.