Here are a few comments on slimming down your code while still keeping it readable (or even improving the readability) Of course, this is all My Not So Humble Opinion, so take with salt. Feel free to see this as a vast exercise in Hubris on my part.

#!/usr/bin/perl require LWP::UserAgent; require HTTP::Request; require HTTP::Response; use HTTP::Request::Common;
First, I'd recommend using perl with the -w (warn) option, and "use strict;" These can save you hours of debugging, and encourage good programming habits. At first it may seem a pain, but with a little practice they add no noticed effort, and you tend to do things a "Right Way" by default. I'd also "use" all those modules rather than "require"ing them. This imports as the module author intended, and if I disagree, I can override the authors defaults. See use for details.
foreach (@ARGV) { if ( $_ eq $ARGV[0] ) { $inputfile = $_; } elsif ( $_ eq $ARGV[1] ) { $outdir = $ARGV[1]; } else { die "Usage: $0 inputfile outdir\n"; } }
This is an unusual way of going about it. You copy the first two arguments, and die if there are more. I prefer the more succint:
die "Usage: $0 inputfile outdir\n" unless scalar @ARGV == 2; #I prefer "scalar @LIST", some prefer $#LIST, #but remember the difference my ($inputfile, $outdir) = @ARGV;
This has the advantage of working as intended (well, dieing as intended) if only one argument is given.

Just one more:

if ($filenum =~ /\d\d\d\d/) {$filenum = $filenum; } elsif ($filenum =~ /\d\d\d/) {$filenum = "0$filenum"; } elsif ($filenum =~ /\d\d/) {$filenum = "00$filenum"; } else {$filenum = "000$filenum"; }
How about:
$filenum = sprintf("%04d", $filenum);

In reply to Re: Adding 'referer' info to spider script by swiftone
in thread Adding 'referer' info to spider script by nuts

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.