bbs2web has asked for the wisdom of the Perl Monks concerning the following question:

I have a simple Perl script which I host using Apache2, the page is used by routers to work out their public breakout IP. Whilst we've improved the client side script to sleep for a random number of seconds we still get hit by many simultaneous requests which drives up CPU utilisation.

I would like to optimise our web server by using mod-perl2, so that the script is compiled once. I'm however having a problem, in that it works for the first request, but each call thereafter serves the same IP from the first request:
Current IP Address: 192.0.2.47

Any tips on where I can go reading on how I update my script to use the REMOTE_ADDR of the client? I understand that it's got to do with me using an environment variable where the script is being executed.

Herewith my Apache configuration stanza:
ScriptAlias /cgi-bin/ /var/www/perl-cgi/ <Directory /var/www/perl-cgi> SetHandler perl-script PerlHandler ModPerl::Registry AllowOverride None Options +ExecCGI -MultiViews </Directory>

My script currently:
#!/usr/bin/perl -T use Data::Validate::IP qw(is_ip is_public_ip is_loopback_ip); @IPs=''; if (defined $ENV{HTTP_X_FORWARDED_FOR}) { @IPs=split /,/, $ENV{HTTP_X_FORWARDED_FOR}; }; splice @IPs, 0, 0, $ENV{REMOTE_ADDR}; foreach $check (@IPs) { $check =~ s/^\s*(.*?)\s*$/\1/g; if (is_ip($check)) { if (defined $ip) { if (not is_public_ip($check)) { $check = $ip; }; if (is_loopback_ip($check)) { $check = $ip; }; if ($check eq "192.0.2.90") { $check = $ip; }; }; $ip = $check; }; }; print "Content-type: text/html\nCache-Control: max-age=0,no-cache,no-s +tore,post-check=0,pre-check=0\n\n<html><head><title>Current IP Check< +/title></head><body>Current IP Address: $ip</body></html>";

Many thanks in advance!


Regards
David Herselman

Replies are listed 'Best First'.
Re: mod-perl2 re-using $ENV{REMOTE_ADDR}
by Corion (Patriarch) on May 01, 2023 at 07:37 UTC

    As we don't see the data going in, this is going to be hard to debug.

    Please add some code that outputs the parameters of the run, that is, output $ENV{HTTP_X_FORWARDED_FOR} and $ENV{REMOTE_ADDR}. Maybe it depends on the input data you get.

    Also, while I am a fan of splice, the unshift keyword is a more clear function to use in that place.

      more clear function to use in that place

      Yes - the splice took a little working out...

      I feel that writing it like this (untested) would be the clearest

      @IPs=''; if (defined $ENV{HTTP_X_FORWARDED_FOR}) { @IPs=split /,/, $ENV{HTTP_X_FORWARDED_FOR}; } else { @IPs=split /,/, $ENV{REMOTE_ADDR}; }
      I also find the semicolon after the if block a bit strange.

        I would write that:
        @IPs=split /,/, $ENV{HTTP_X_FORWARDED_FOR} // $ENV{REMOTE_ADDR};
        as it is both shorter and clearer in its intent, and doesn't pointlessly set @IPs only to immediately re-set it. For pre-5.10 you'd have to use a conditional operator, which is left as an exercise for the reader.
        Very much appreciate all the tips!

        We had an issue with some transparent proxies (the URL is served as http://), where those sometimes add HTTP_X_FORWARDED_FOR but then detail internal IPs. We subsequently may still wish to refer to the remote_addr if all the IPs in the http_x_forwarded_for list are all not valid. The following appears to be working perfectly now:
        use Data::Validate::IP qw(is_ip is_public_ip is_loopback_ip); my $ip; my @IPs; if (defined $ENV{HTTP_X_FORWARDED_FOR}) { @IPs=split /,/, $ENV{HTTP_X_FORWARDED_FOR}; }; unshift (@IPs, $ENV{REMOTE_ADDR}); foreach $check (@IPs) { $check =~ s/^\s*(.*?)\s*$/\1/g; if (is_ip($check)) { if (defined $ip) { if ( (not is_public_ip($check)) or (is_loopback_ip($check)) or ($check eq "41.79.21.90") ) { $check = $ip; } } $ip = $check; } } print "Content-type: text/html\nCache-Control: max-age=0,no-cache,no-s +tore,post-check=0,pre-check=0\n\n<html><head><title>Current IP Check< +/title></head><body>Current IP Address: $ip</body></html>";
      Thank you for your feedback, searching for 'splice 0,0 v unshift' appears to show that unshift is twice as performant as splice. This really isn't a large array though but if it's easier to read and faster in the process then why not... ;)


      It appears I wasn't initialising $ip, when I added 'undef $ip' it now works as expected:

      Apache's configuration stanza:
      ScriptAlias /cgi-bin/ /var/www/perl-cgi/ PerlSwitches -wT <Directory /var/www/perl-cgi> SetHandler perl-script PerlHandler ModPerl::Registry PerlSendHeader On AllowOverride None Options +ExecCGI -MultiViews </Directory>

      And the script where I now undefine the $ip variable at start:
      use Data::Validate::IP qw(is_ip is_public_ip is_loopback_ip); @IPs=''; undef $ip; if (defined $ENV{HTTP_X_FORWARDED_FOR}) { @IPs=split /,/, $ENV{HTTP_X_FORWARDED_FOR}; }; unshift (@IPs, $ENV{REMOTE_ADDR}); foreach $check (@IPs) { $check =~ s/^\s*(.*?)\s*$/\1/g; if (is_ip($check)) { if (defined $ip) { if (not is_public_ip($check)) { $check = $ip; }; if (is_loopback_ip($check)) { $check = $ip; }; if ($check eq "41.79.21.90") { $check = $ip; }; }; $ip = $check; }; }; print "Content-type: text/html\nCache-Control: max-age=0,no-cache,no-s +tore,post-check=0,pre-check=0\n\n<html><head><title>Current IP Check< +/title></head><body>Current IP Address: $ip</body></html>";

        Making the variables lexical variables would likely also help:

        use strict; my @IPs; my $ip;
Re: mod-perl2 re-using $ENV{REMOTE_ADDR}
by Anonymous Monk on May 01, 2023 at 12:08 UTC
    You assign to $ip at end of loop