common has asked for the wisdom of the Perl Monks concerning the following question:

hello fellow monks, i wrote this little script to check for a remote webserver's general information. after using it a few times i noticed that sometimes it caught the sites html and printed it to the screen, which wasn't in my plan. after that i added a bunch of regex to catch the lines i wanted. basically, it's really clunky and i was just wondering you you could open my mind to a much simpler method. i'm just a novice so please forgive my obvious mistakes. thanks.
use IO::Socket; if ($#ARGV != 0) { die "usage: perl $0 [hostname]\n"; } else { $host = $ARGV[0]; } $socket = IO::Socket::INET->new( Proto => "tcp", PeerAddr => $host, PeerPort => 80,) or die $!; $socket->autoflush(1); print $socket "HEAD / HTTP/1.0\015\012\015\012"; while (<$socket>) { if (/^Set-Cookie:/) {print} if (/^Server:/) {print} if (/^P3P:/) {print} if (/^Last-Modified:/) {print} if (/^ETag:/) {print} if (/^X-Powered-By:/) {print} if (/^HTTP/) {print} if (/^Accept-Ranges:/) {print} if (/^Date:/) {print} if (/^Expires:/) {print} if (/^Cache-control:/) {print} if (/^Content-Type:/) {print} if (/^Location:/) {print} if (/^Content-Location:/) {print} if (/^X-Pad:/) {print} if (/^Connection:/) {print} if (/^MIME-Version:/) {print} if (/^Pragma:/) {print} if (/^Vary:/) {print} if (/^TCN:/) {print} if (/^Content-Language:/) {print} if (/^PICS-Label:/) {print} } close $socket;

Replies are listed 'Best First'.
Re: checking output for text
by Zaxo (Archbishop) on Oct 09, 2002 at 00:41 UTC

    You need LWP::UserAgent for a full-strength version of what you're trying to do.

    If your HEAD request fails, try GET, lots of sites block HEAD requests for some reason.

    After Compline,
    Zaxo

      another way without making use of a module would be to define a hash or array with the values you wanted to find (I think I would use a hash)... If I noticed correctly all the lines you are intested in start with your 'key' and a colon so
      %to_find = ( 'Set-Cookie' => '1', 'Server' => '1', #etc so on and so forth ); while (<$socket>) { # split on the first colon, but leave the rest alone ($key,$rest) = split(/:/, $_, 2); print "$key: $rest\n" if ($to_find{$key}); }
Re: checking output for text
by chromatic (Archbishop) on Oct 09, 2002 at 00:45 UTC

    If you use LWP, you can use the header() method in the response object. The code would look something like this:

    use LWP::UserAgent; my $ua = LWP::UserAgent->new(); my $response = $ua->get( $host ); foreach my $header (qw( Set-Cookie Server P3P )) { print $response->header( $header ); }
Re: checking output for text
by foxops (Monk) on Oct 09, 2002 at 00:47 UTC
    Wow, this is a great script. If I were you, I would assign those regex to variables so you could expand your script to do a little logic. But I'm not exactly sure what you want to do with this... There isn't anything in the Snippets section quite like this (most likely because it would be simple to implement in LWP), but it could help a struggling beginner (like myself).
Re: checking output for text
by Enlil (Parson) on Oct 09, 2002 at 01:00 UTC
    This is trivial, as it only checks one site and returns the lines that you want. But since all the if statements are checking the start of the line for certain things, once a particular "if" has succeeded there is no need to check the rest, so I would change all the "if" statements to "elsif" after the first one.

    Enlil

Re: checking output for text
by io (Scribe) on Oct 09, 2002 at 07:14 UTC
    In case you don't want to use the modules (which i recommend you do offcourse). Could add this line in the loop to make it skip the html.
    while (1) { ... last if /^$/; }
    Since the html header and body are seperated by a blank line.