megaurav2002 has asked for the wisdom of the Perl Monks concerning the following question:

Dear learned monks,

I wish to check that whether the input file is valid XHTML or not. As far as i can see, HTML::Tidy only allows to convert a HTML file to a XHTML file or it can just check that whether the input file is valid HTML or not.
Can anyone please throw some light on it.

Thanks,
Gaurav

"Wisdom begins in wonder" - Socrates, philosopher

Replies are listed 'Best First'.
Re: HTML::Tidy question
by Your Mother (Archbishop) on Apr 14, 2008 at 06:30 UTC

    The XML::LibXML family might work for you. You need to parse the DTD -- XML::LibXML::Dtd -- and apply it against the document. $doc->validate($dtd)

    I'm not sure how well this works but I often use XML::LibXML for parsing XHTML fragments as a cheap well-formedness validation. If it parses, it's well-formed (though not necessarily correct XHTML). Sorry I don't have a snippet for you.

    Update: I just tried a test and it looks like straight $doc->validate() and the non-fatal $doc->is_valid() work fine against the DTD declared in the document. I think you only have to pass it the DTD if you want to apply an external/custom set of rules to the document.

      Thank you monks for enlightening me, XML::LibXML::Dtd worked for me.

      Cheers,
      Gaurav Talwar
      "Wisdom begins in wonder" - Socrates, philosopher
Re: HTML::Tidy question
by ikegami (Patriarch) on Apr 14, 2008 at 02:21 UTC
      I very much doubt XML::Validator::Schema can parse the XHTML schema. It's a useful module (if I do say so myself) but it only implements part of the XML Schema spec. Patches welcome, of course!

      -sam

Re: HTML::Tidy question
by oko1 (Deacon) on Apr 14, 2008 at 02:06 UTC