in reply to Parse XML file

Blackdragoen, you've already been given the advice (in the CB) that you should use XPATH to find such nodes. You've been told that your lack of understanding of XML is the problem. To reiterate, there's no such thing as an 'end tag' in XML. The nodes you're interested in are nodes without children nor attributes.

You should parse the document with XML::Parser and pass the result to XML::XPath with the appropriate query expression. I've already given you a hint for the XPATH expression; *[not(*)] will give you all the nodes without children.

Have you applied this suggestion yet, or researched XPATH as you were advised?

-David

Replies are listed 'Best First'.
Re^2: Parse XML file
by erroneousBollock (Curate) on Sep 24, 2007 at 10:50 UTC
    This worked for me:

    print $_->getName."\n" for $xp->findnodes('//*[not(child::*) and not(attribute::*) and not(st +ring(.))]');

    -David

Re^2: Parse XML file
by Jenda (Abbot) on Sep 25, 2007 at 09:17 UTC
      For HTML, many folks do use that monicker for markup of the form </name>. In the HTML DOM (and the XML DOM for that matter) no construct maps to that markup, it's merely an artifact of DOM serialisation; an equivalent serialisation might employ whitespace (such as indentation) to demarkation of nested nodes. There is no way to "search" for such an entity in the DOM.

      I personally don't mind it when people call markup of that form an 'end tag' in HTML because it's possible to construct invalid documents that will usually render correctly in most HTML browsers.

      It makes no sense to refer to "tags" for XML as it's not possible to make use of an invalid XML document. The DOM is constructed from (possibly) nested Nodes (of various types) and string-like values attached to those Nodes.

      And finally, the OP was refering to the XML nodes of the form <name />, which is not at all the same as HTML of the form </name>  . The former represents a Node in the DOM without child-nodes, nor attributes, nor value; the later does not represent any node, attribute or value in the DOM.

      -David [erroneousBollock isn't logged in]

        Whether it's an "artifact of serialisation" or not is irrelevant, often you need to point to such a thing in the XML file. If only to point out that the file is invalid because the end tag at line 12345 doesn't match the opening tag at line 12340. While it may be impossible to "make use of" an invalid XML document, it's very well possible to create such a document. And in that case it's helpfull to be able to name all the thingies in the document. And preferably not in the "artist formerly known as Prince" style.

        And XML ne DOM.