Re: Identifying PDF from URLs

tinita is steering you rightly. You can see from this snippet though that the page is not a PDF. Even the page it redirects to in a browser is not a PDF but an HTML page with a PDF viewer embedded. Getting the PDF from that scheme might not end up being trivial. :(

perl -MLWP::Simple=head -le 'print [ head(+shift) ]->[0]' "http://ccdl
+.libraries.claremont.edu/u?/stc,87"
text/html
[download]

Comment on Re: Identifying PDF from URLs Download Code

Replies are listed 'Best First'.
Re^2: Identifying PDF from URLs by tinita (Parson) on May 25, 2010 at 00:19 UTC
Getting the PDF from that scheme might not end up being trivial. Indeed. In this case it seems HEAD requests are blocked. I tried to fetch the direct link to the pdf with the HEAD script and it returned text/html and "Content-Disposition: filename=404.txt". So it's necessary here probably to use a GET request with LWP::UserAgent and from there read the http headers :-/	[reply]
Re^3: Identifying PDF from URLs by listanand (Sexton) on May 25, 2010 at 00:23 UTC
Hi all, Thanks for your replies. I will try these suggestions later tonight and get back to you. Andy	[reply]