in reply to Re^3: XML::LibXML::Reader giving wrong matched element
in thread XML::LibXML::Reader giving wrong matched element

What is it that distinguishes the <state...> and <city...> tags from the <country...> tags? Is it strictly that the OP's code provides the shortcut close, "/>" for state and city but not for country?

Yes.

If <country...> had a shortcut close would it not need a </county> tag?

<foo x="y"/> and <foo x="y"></foo> are completely equivalent, so not only would it not need a </country> tag, it could not have a </country> tag. One can't close an element more than once.

your explanation doesn't seem consistent with beginner tuts

I presume you are referring to "all XML elements must have a closing tag".

That claim is true, but <foo/> serves as both the opening and closing tag of the element, so it satisfies the requirement of the presence of a closing tag.

why not use a shortcut close globally -- that is, on <world> and <country>

That would be impossible because the world and country elements have non-attribute children.

In fact, I'd say the city elements are misplaced in the OP's XML. The indenting indicates the OP wants them to be children of states, but he made them children of countries.

<country short="usa" name="united state of america"> <state short="CA" name="california"/> <city short="SFO" name="San Franscisco"/> <city short="EM" name="Emeryville"/> <state short="FL" name="florida"/> ... More intermixed states and cities ... </country>

means

<country short="usa" name="united state of america"> <state short="CA" name="california"></state> <city short="SFO" name="San Franscisco"></city> <city short="EM" name="Emeryville"></city> <state short="FL" name="florida"/></state> ... More intermixed states and cities ... </country>

but he surely wants

<country short="usa" name="united state of america"> <state short="CA" name="california"> <city short="SFO" name="San Franscisco"/> <city short="EM" name="Emeryville"/> </state> <state short="FL" name="florida"> ... More cities ... </state> ... More states ... </country>

the shortcut close on image is NOT required by 4.01 transitional (aka "loose").

That's not right.

SGMLHTML5
HTML
Serialisation
XML
HTML4OtherXHTML1
strict
XHTML1
transitional
HTML5Any other
XML schema
stricttransitional
<br>Well-formed and Valid[Varies]ValidMalformed
<p>Well-formed and ValidValid
<div>Well-formed but InvalidInvalid
<br/>MalformedTolerated*Well-formed and Valid
<p/>Invalid
<div/>Invalid
<br></br>Well-formed but InvalidInvalidWell-formed and Valid
<p></p>Well-formed and ValidValid
<div></div>Well-formed and ValidValid

Note that browsers are very forgiving and accept all kinds of malformed and invalid HTML.

As an aside, the table clearly highlights XHTML's advantage over HTML: simplicity. The cost, of course, is that XHTML is more wordy. (Like Java vs Perl?)

* — The HTML serialisation of HTML5 accepts "/" on elements that cannot have a closing tag (area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr). (ref)

The standards for 4.01 transitional and 4.01 strict differ on what's required

They differ on what constitutes a valid HTML or XHTML document (i.e. what elements and attributes are allowed), but they do not differ on what constitutes a well-formed HTML or XML documents (i.e. on what is valid syntax).

Replies are listed 'Best First'.
Re^5: XML::LibXML::Reader giving wrong matched element
by ww (Archbishop) on Nov 25, 2011 at 04:04 UTC
    ikegami
    Thanks for the clarifications re XML; I think I have a general idea of the meaning of your "non-attribute children" (but shall have to look further, to be sure). But the rest is crystal clear. Again, thank you for putting so much information into your reply.

    But, I wonder if I was unclear about the "shortcut close" ( ".../>") for <img src="foo.jpg alt=... > as your table does not illustrate it. My assertion that 'the shortcut close on image is NOT required by 4.01 transitional (aka "loose")' is supported by the likes of Dave Raggett (at http://www.w3.org/MarkUp/Guide/ for example) and -- more important -- in the "HTML 4.01 Specification, W3C Recommendation 24 December 1999" (at http://www.w3.org/TR/REC-html40/) which links to an illustration of the use of at http://www.w3.org/TR/REC-html40/struct/objects.html.

    Granted, these are both decade-old documents, but I find nothing to countenance the shortcut close under 4.01 transitional nor any indication of any substantive difference on this point between the proposal cited and current standards -- for html 4.01 transitional.

    Update: In fact, what seems to me conclusive is the statement in the very latest 4.01 spec (at http://www.w3.org/TR/1999/REC-html401-19991224/struct/objects.html) re the tag:

    Start tag: required, End tag: forbidden

    the emphasis is in the original.

    Usually, when I make such a statement in disagreement with something you've said, it merely proves that I've missed something crucial. Is that the case here, and if so, would you be so good as to point me (and future readers) to it?

      I didn't use IMG because it has required attributes, and I didn't want that to become an issue. In other aspects, IMG is like BR. Refer to the rows for BR.

      My assertion that 'the shortcut close on image is NOT required by 4.01 transitional (aka "loose")' is supported by the likes of Dave Raggett (at http://www.w3.org/MarkUp/Guide/ for example)

      Saying "not required by 4.01 transitional" implies "allowed by by 4.01 transitional", and that's not case. It's not allowed in HTML. The linked document is completely silent on the subject.

      And again, whether it's the transitional or strict makes no difference whatsoever here, since they don't affect syntax.

      In fact, what seems to me conclusive is the statement in the very latest 4.01 spec (at http://www.w3.org/TR/1999/REC-html401-19991224/struct/objects.html) re the <img ...> tag: Start tag: required, End tag: forbidden

      Looking at the definition of an element is irrelevant because <foo/> is never well-formed HTML.