in reply to Sort xml based on attribute

use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', rules => { _default => 'raw', itemid => sub { my ($tag,$attrs,$context,$parents) = @_; $parents->[-4]{':PUI'} = $attrs->{_content} if $attrs->{id +type} eq "PUI"; return [$tag => $attrs]; # same thing the 'raw' built-in d +oes }, item => 'as array', bibdataset => sub { my ($tag,$attrs) = @_; @{$attrs->{item}} = sort {$a->{':PUI'} <=> $b->{':PUI'}} @ +{$attrs->{item}}; $attrs->{_content} = [ (map( ( "\n\t", [item => $_]), @{$attrs->{item}})), "\n", ]; delete $attrs->{item}; return $tag => $attrs; }, } ); $parser->filter(\*DATA); __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <bibdataset ...

Basicaly ... whenever an <itemid> tag is fully parsed (including content and end tag), the code checks whether the idtype eq "PUI" and if it does it remembers the content in the tag's parent's parent's parent's parent (i.e. the <item> tag ... attributes starting by a colon are never exported to the resulting XML) and then it add the tag's data into the parent's content. Then the <item> tags are removed from the parent tag's content and stored in an array stored in the parent tag's hash of attributes under key "item".

Then once the XML is fully parsed, the array of items is sorted, some whitespace gets inserted between the items and the resulting array becomes the contents of the root tag. And the tag with the attributes and content (including child tags) gets printed.

The code assumes the <itemid> will always be at the same level below <item> and that there will only <item> tags in bibdataset!

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^2: Sort xml based on attribute
by Anonymous Monk on Aug 12, 2010 at 02:35 UTC
    Sorry to be a pest Jenda, but I'm trying to modify this script for my own use but I've only just started learning perl so much of this is completely new to me. Could you explain what each bit does if it's not too much trouble? Or explain how I could modify it to sort an XML file such as this based on category, subcategory and then code1? Sorry to ask so much but I really am at a loss.
    <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010500</code1> <name>client.invalidData</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client data invalid</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> </results> </ResultDetail>
    Note: It's actually a much much larger file (9510 lines)

      The code will be a bit simpler, but whether it will be any easier to understand I don't know. What language(s) do you have experience with?

      use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', # we want to filter (modify) the XML, not extra +ct data rules => { _default => 'raw', # we want to copy most tags intact, includi +ng the whitespace in and around them # the data of the tags will end up in the _content pseudoa +ttribute of the parent tag 'category,subCategory,code1' => 'raw extended', # these three we need not only to copy, but also made easi +er to access. # The "raw extended" rule causes the data of that tag to b +e available in the hash of the parent tag # also as ":category", ":subCategory" and ":code" so you d +o not have to search through the _content array 'ResultItem' => 'as array', # we expect several <ResultItem> tags and want to store th +e data of each in an array . # the array will be accessible using the 'ResultItem' key +in the hash containing the data of the parent tag 'results' => sub { my ($tag,$attrs) = @_; # this is the Perl way to assign na +mes to subroutine/function parameters # this subroutine is called whenever the <results>...< +/results> is fully parsed and the rules # specified for the child tags evaluated. if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) +{ # if there are any <ResultItem> tags and there's more +than one @{$attrs->{ResultItem}} = sort { # sort allows you to specify the code to be us +ed to compare the items to sort # the items are made available as $a and $b to + the code. # in this case the $a abd $b are hashes create +d by processing the child tags of the <ResultItem> tags. $a->{':category'} cmp $b->{':category'} or $a->{':subCategory'} cmp $b->{':subCategory'} or $a->{':code1'} cmp $b->{':code1'} } @{$attrs->{ResultItem}}; } $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content} +); # remove the accumulated whitespace that was present b +etween the <ResultItem> tags return [$tag => $attrs] } } ); $parser->filter(\*DATA); # see the XML::Rules docs for ways to redirect the output to file __DATA__ <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> ...

      Update: Please see Re^9: Sort xml based on attribute for a fixed version.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        It's a lot clearer now Jenda, thank you for your time and patience.

        I was just wondering three things:

        1) where does this take the XML data from (a file? -- if so where is it specified?)
        2) this line in particular $parser->filter(\*DATA); is it an output or something else?
        3) last thing, I get an error when I try to run this perl script (might be because I don't quite understand the input/output of this script)
        Name "main::DATA" used only once: possible typo at sort.pl line 45. <?xml version="1.0"?> not well-formed (invalid token) at line 1, column 4, byte 4 at sort.pl line 45
        Just off the top of my head, the languages that I'm familiar with: C, C++, Java, Python, XML (seemed relevant in this case :P) and some other things here and there (not counting webdev - doesn't seem relevant here?).

        Thank you again for answering my questions. I greatly appreciate it.
        Sorry one other thing.

        Just wanted to let you know that I have actually been reading the XML::Rules documentation from http://search.cpan.org/~jenda/XML-Rules-1.10/lib/XML/Rules.pm so you don't think I'm lazily asking you for all the answers without attempting some research myself :P

        That being said, do you think this is the best resource for XML::Rules information?