slloyd has asked for the wisdom of the Perl Monks concerning the following question:

I am interested in writing a script that can determine if a particular URL request is XML or HTML. Any ideas on how to detect this?

Replies are listed 'Best First'.
Re: xml/html detection script
by davorg (Chancellor) on Aug 17, 2005 at 13:06 UTC

    I assume that by "URL request" you actually mean a HTTP request. But they are never XML or HTML, they are just plain text.

    So I suspect that you're actually talking about the content type of the HTTP response, rather than the request. In that case you can just look at the content type header. If it's "text/html" then the body is HTML, if it's "text/xml" then the body is XML.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      You'd also have to account for other XML Content-types, such as application/xml, application/xhtml+xml, application/rdf+xml, and so on.

      Something like this might work:

      if ($content_type =~ m!(application|text)/(.*\+)?xml!) { # Treat as XML }
Re: xml/html detection script
by b10m (Vicar) on Aug 17, 2005 at 13:06 UTC

    The easiest way, is to do a HEAD on the URL in question and see what the returned headers tell you. So, something like:

    use LWP::Simple qw/head/; for('http://perlmonks.org/index.pl?displaytype=xml;node_id=484407', 'http://perlmonks.org/index.pl?node_id=484407') { my ($type, $length, $modified_time, $expires, $server) = head($_); print "File is: $type\n"; }
    --
    b10m

    All code is usually tested, but rarely trusted.
Re: xml/html detection script
by zentara (Cardinal) on Aug 17, 2005 at 13:11 UTC

      The Content-Type will still be text/xml ?

      Update: Or at least =~ /xml/ ;)

      --
      b10m

      All code is usually tested, but rarely trusted.
Re: xml/html detection script
by shiza (Hermit) on Aug 17, 2005 at 17:20 UTC
    You also have to take into consideration that the server might not return the proper content-type if it isn't configured properly.
Re: xml/html detection script
by Transient (Hermit) on Aug 17, 2005 at 17:24 UTC
    I'm not sure if what you're asking is what you want. A URL request is like so: http://www.perlmonks.org - that's it... I've requested a URL (and what's more, that's actually an HTTP request). What do I want returned? XML or HTML? I don't know.

    Now if I say http://www.perlmonks.org/index.html then more than likely it's an HTTP request for HTML. So if you're truly looking to determine what the REQUEST is for, then just look at the extension. Otherwise, it's anyone's guess.

    If you're looking for the response type, the other answers can lead you in that direction.