in reply to Re: Keep quotes around numerical attributes after parsing with HTML::Treebuilder?
in thread Keep quotes around numerical attributes after parsing with HTML::Treebuilder?

Many thanks, I had overlooked that method. I'm unclear though -- is html in xml format the same as xhtml, or is the mapping messier? In other words, does as_xml also mean xhtml as well? If yes, I could use this rather than the tidy solution I explained above on the update.
  • Comment on Re^2: Keep quotes around numerical attributes after parsing with HTML::Treebuilder?

Replies are listed 'Best First'.
Re^3: Keep quotes around numerical attributes after parsing with HTML::Treebuilder?
by CountZero (Bishop) on Jul 19, 2005 at 13:32 UTC
    XHTML is HTML 4.01 conforming to the XML standards.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Well "...ish".

      You can't write HTML 4.01 that conforms to XML standards, but XHTML 1.0 is a language that reimplements HTML 4.01 in XML.

      The obvious change to give as an example is that <br> in HTML is <br/> in XHTML - but if you tried to use <br/> in HTML then it would mean the same as <br/>> (or a line beak followed by a greater than symbol).

      So you can't just output XHTML and then slap an HTML Doctype on it. (Heck, its not really safe to serve XHTML as text/html despite what it says in Appendix C of the XHTML 1.0 Spec.)

        if you tried to use <br/> in HTML then it would mean the same as <br/>> (or a line break followed by a greater than symbol)
        Are you sure? I tried <br/> in a small HTML-file and had it validated by the "official" W3C-validator at http://validator.w3.org.
        The uploaded file was tentatively found to be Valid. That means it would validate as HTML 4.01 Strict if you updated the source document to match the options used (typically this message indicates that you used either the Document Type override or the Character Encoding override). Source Listing Below is the source input I used for this validation:
        1: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> 2: <html> 3: <head><title>TITLE</title></head> 4: <body><p>TEST<br/>TEST</p></body> 5: </html>
        I don't think you can "escape" characters with special meaning in (X)HTML by using a slash. You must use entities for that.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law