Haloric has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have many xml files that I am attempting to compare with either XML::SemanticDiff or XML::SemanticCompare.

Is there a way to get the XML ordered in a consistent way before I start this comparison, without calling out to a separate tool. They both get confused by slight ordering changes

For example, I have elements that are

<sequence name="b" .. <sequence name="a" .. <sequence name="c" ..

I would like them ordered by the 'name' attribute before the comparison starts.

I have looked at XML::LibXML::PrettyPrint but can't make it do what I want.

I can see SemanticDiff copes with missing attributes, but not sure how it would cope with a missing '<sequence name="b" ' element all together without thinking it was a difference in attribute value, rather than entirely missing.

Thanks

Replies are listed 'Best First'.
Re: Consistent xml formatting
by choroba (Cardinal) on Jul 28, 2015 at 14:20 UTC
    Order of attribute doesn't matter, therefore they can be normalized. Order of elements is significant, though, so the two XML documents with swapped elements are not equivalent. If your application doesn't care about the order but you do, make it always save the element in a defined order.

    In XML::XSH2, you can sort the elements by

    move &{ sort :k @name /path/to/sequence } replace /path/to/sequence ;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for the reply.

      For my purposes, the order of elements does not matter. It is the contents of a snapshot of a directory (say), in a random order. The 'name' attribute is (say) the file name or a sub directory name (which contains its own elements).

      I want to be able to compare two xml files describing a file system in this way, comparing all attributes and sub elements but using 'name' attribute as the primary key.

      An element can be missing or added in either xml file, and I want to be able to flag it as such. Elements with the same 'name' attribute are to be compared for sub elements and attributes

      I was hoping that getting them into a consistent ordering would help SemanticDiff cope with missing or new elements. But I think now I will have to write custom code that parses the XML to identify the elements that are new / missing between the two files, and uses SemanticDiff perhaps to compare just elements, not the whole document.

      Sorry, but order of elements is not always significant. The XML standard allows you to define sections that must contain a set of elements regardless of order. This can be specified in an XML schema with the tag <xs:all>

      If OP doesn't care about element order, then they are better off building a schema and using that to validate their XML

        Are you sure XML schema is part of the standard?
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ