Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Monks,

Problem description: My company has some revision controlled documents that they store with an increasing file letter scheme (beginning with dash), i.e.

2222_-.doc 2222_A.doc 2222_B.doc 2222_C.doc etc.

These documents are linked from the web, but they don't want to have to go in and change the HTML every time a document revision changes. So I wrote a CGI script which would get the path to the base file (i.e. 2222_.doc) as an input, and determine the correct revision and redirect the user. I originally wrote this using Net::FTP, and it was working, but then the web server crashed. The sysadmin had some problems with the Apache .netrc file (which was storing the login/password to the documents server), and he doesn't have the time to fix this.

So I have to rewrite without FTP access. I went to CPAN and looked up HTTP::Request, and went from there. My code is working at the moment, but it has to download the document twice in order for the user to view it. Once on the server to make sure it's valid, and again for the user to download it. Some of these files are rather large, so this seems like a waste of time. But even sending the user the direct output from the $ua->request() call won't save much time.

End result: I'm looking for a way to see if a link is valid without downloading the content at that link.

I found this script on CPAN which will print just the returned headers, but it still downloads the whole page before printing it. This is leading me to believe that this may not be possible? I guess this question is more of an HTTP question than a strictly Perl one, but since there are so many modules out there, I thought someone could shove me in the right direction if I'm missing something.

Here is my code:

#!/sw/local/bin/perl -Tw use strict; use HTTP::Request; use LWP::UserAgent; print "Content-type: text/html\n\n"; print "<html>\n"; my $path = $ENV{'QUERY_STRING'}; if(!$path) { dienice("Must pass in at least 1 argument.") } my $file; if($path =~ s:/([^/]+)$::) { $file = $1 } else { dienice("Incorrectly formatted path : $path") } my $ext; #file extension if($file =~ s/\.(.+)$//) { $ext = $1 } else { dienice("Oncorrectly formatted filename: $file") } my $ua = LWP::UserAgent->new; my $tmpfile; foreach my $rev ("-", ('A'..'Z')) { $tmpfile = "$file$rev.$ext"; my $request = HTTP::Request->new(GET => "$path/$tmpfile"); my $response = $ua->request($request); print "$response->{_msg}<br>"; last if($response->{_msg} eq "OK"); } print "<head><meta http-equiv=Refresh content=\"10; URL=$path/$tmpfile +\"></head><body>"; print "<a href=\"$path/$tmpfile\">Please click here if you are not aut +omatically redirected</a><br>\n\ "; print "<p>Due to security measures, you will only be able to access fi +les in /QUALITY/DocConSys in an\ account that has access to this folder.<br>"; print "</body></html>"; exit; sub dienice { print "<body><h2>Error:</h2>$_[0]</body></html>"; exit; }

btw, any other comments on my code or method are welcome. Thanks for your help


In reply to checking for valid http links by swkronenfeld

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2022-12-02 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?