RM99 has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory full of .xml files. These files contain failure messages like this

</fail-token><fail-message></fail-message><node-host>

this token contains no failure. I want to search through all files in the directory for fail messages and then group them by type. Reading the file names into a list is simple enough parsing the xml for errors and grouping bye type (I have no idea how many different types of errors there will be0 is what has me scratching my head. Yes I am a new newb..this feels like it should be easy but I keep failing please help thanks!

Replies are listed 'Best First'.
Re: parsing xml
by graff (Chancellor) on May 05, 2014 at 03:56 UTC
    Welcome to the Monastery. If you want to be able to find out which files have non-empty "fail-message" elements, so that you can see how many different values there are for fail messages, you probably want to use XPath expressions (which you can look up elsewhere - it's not strictly a Perl thing).

    For example, a while back I figured out how to use the XPath facilities in XML::LibXML, and it was so easy and so cool, I posted some sample code for a generic tool to take an XPath expression and an XML file as command-line args, and output the portions of a given file (if any) that matched a given XPath: Re: XPath command line utility....

    Consider a sample XML file like this (let's call this file "test.xml"):

    <foo> <bar id="t1"> <fail-message></fail-message> </bar> <baz id="t2"> <fail-message>yar</fail-message> </baz> </foo>
    If I want to extract non-empty "fail-message" elements from that sort of XML data, the XPath expression would be:
    //*[fail-message!='']
    To apply that expression using my "exp" tool, the (bash) command line would be:
    exp -p "//*[fail-message!='']" test.xml
    Since I might want to see all the markup (including attributes) for nodes that match the expression, I included a '-x' option on my "exp" script to do just that. In this case, seeing the markup would be helpful if I want to look at the empty fail-messages (by using "=" instead of "!=" in the XPath expression).
Re: parsing xml
by ww (Archbishop) on May 04, 2014 at 23:54 UTC
    Your problem (or maybe just 'problem description') has /me headscratching too:
    </fail-token><fail-message></fail-message><node-host>
    That doesn't look like valid xml. Are you sure? (...and not just BTW, please enclose your data in code tags, just as you would wrap code in code tags. Writeup Formatting Tips and, not just BTW2, "I keep failing" is not a useful error message nor does it help us to help you. Please see On asking for help).

    Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
    1. code
    2. verbatim error and/or warning messages
    3. a coherent explanation of what "doesn't work actually means.

    check Ln42!

Re: parsing xml
by Anonymous Monk on May 04, 2014 at 22:40 UTC

    ... is what has me scratching my head...please help thanks!

    I suggest anti-itch shampoo ... or maybe flea shampoo ...

     [pars xml] pars xml -> Parsing XML (examples using xml::: twig, xsh, rules, simple, libxml...

    ... links about parsing xml