LWP::Simple to judge the url

sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: LWP::Simple to judge the url by moritz (Cardinal) on Jun 20, 2011 at 09:22 UTC
If you're not fixed on LWP::Simple, here's an example with Mojolicious: `use Mojo::UserAgent; my $url = "http://de.arxiv.org/pdf/1106.3541" print Mojo::UserAgent->new->head($url)->res->headers->content_type;` [download] This prints `application/pdf`, indicating that document returned from this URL is a PDF file. Perl 6 - second systems done right	[reply] [d/l] [select]
Re^2: LWP::Simple to judge the url by Corion (Patriarch) on Jun 20, 2011 at 09:29 UTC
(and if you don't have Mojolicious installed, LWP::UserAgent does it as well:) `use LWP::UserAgent; my $url = "http://de.arxiv.org/pdf/1106.3541"; print LWP::UserAgent->new->head($url)->headers->content_type()` [download]	[reply] [d/l]
Re: LWP::Simple to judge the url by Corion (Patriarch) on Jun 20, 2011 at 09:18 UTC
See the `->head` method of LWP::UserAgent. The `Content-Type` header of the response should tell you what content the page sends back.	[reply] [d/l] [select]
Re^2: LWP::Simple to judge the url by bart (Canon) on Jun 20, 2011 at 11:34 UTC
LWP::Simple does `head` just as well. And it might be simpler to use. head($url) Get document headers. Returns the following 5 values if successful: ($content_type, $document_length, $modified_time, $expires, $server) Returns an empty list if it fails. In scalar context returns TRUE if successful.	[reply] [d/l]
Re: LWP::Simple to judge the url by ww (Archbishop) on Jun 20, 2011 at 15:36 UTC
The excellent replies above quite satisfactorily answer the question asked. But the question itself strikes me as a bit odd, in an age when mislabeled or unlabled internet content should probably be regarded as suspect/undesireable/dangerous. OP's test for existance tells the name (and -- in some cases -- the nominal file.typ) of the target of the link. The answers above tell how to find out the actual type of file whether or not (OP's case) an extension is provided on the server. OTOH, were one to rely on a browser, clicking a mislabeled link might provide perhaps as little info as "binary" (try this on a MSWord doc mislabeled as doc.foo, with FF under linux); perhaps misleading info on the actual type (content) of the file (try opening a .pdf mislabeled as an .xls, under w32). Is there an X/Y problem here or am I missing some reasonable basis for the question?	[reply]
Re^2: LWP::Simple to judge the url by chrestomanci (Priest) on Jun 20, 2011 at 21:05 UTC
Could the question be some sort of test or homework? A few weeks back I was sent a series of about 10 perl questions by a potential employer, with no real time limit on them.() Two of the questions where about checking if URLs worked, and what file type was at the end of them, so rather similar to this question. I was asked to bring answers to a job interview a week or so later.	[reply]


Perl: the Markov chain saw
	PerlMonks