Re^3: Sort xml based on attribute

The code will be a bit simpler, but whether it will be any easier to understand I don't know. What language(s) do you have experience with?

use strict;
use warnings;
no warnings 'uninitialized';

use XML::Rules;

my $parser = XML::Rules->new(
    style => 'filter', # we want to filter (modify) the XML, not extra
+ct data
    rules => {
        _default => 'raw', # we want to copy most tags intact, includi
+ng the whitespace in and around them
            # the data of the tags will end up in the _content pseudoa
+ttribute of the parent tag
        'category,subCategory,code1' => 'raw extended',
            # these three we need not only to copy, but also made easi
+er to access.
            # The "raw extended" rule causes the data of that tag to b
+e available in the hash of the parent tag
            # also as ":category", ":subCategory" and ":code" so you d
+o not have to search through the _content array
        'ResultItem' => 'as array',
            # we expect several <ResultItem> tags and want to store th
+e data of each in an array .
            # the array will be accessible using the 'ResultItem' key 
+in the hash containing the data of the parent tag
        'results' => sub {
            my ($tag,$attrs) = @_; # this is the Perl way to assign na
+mes to subroutine/function parameters
                # this subroutine is called whenever the <results>...<
+/results> is fully parsed and the rules
                # specified for the child tags evaluated.
            if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) 
+{
                # if there are any <ResultItem> tags and there's more 
+than one
                @{$attrs->{ResultItem}} = sort {
                        # sort allows you to specify the code to be us
+ed to compare the items to sort
                        # the items are made available as $a and $b to
+ the code.
                        # in this case the $a abd $b are hashes create
+d by processing the child tags of the <ResultItem> tags.
                        $a->{':category'} cmp $b->{':category'}
                        or
                        $a->{':subCategory'} cmp $b->{':subCategory'}
                        or
                        $a->{':code1'} cmp $b->{':code1'}
                    }
                    @{$attrs->{ResultItem}};
            }
            $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content}
+);
                # remove the accumulated whitespace that was present b
+etween the <ResultItem> tags
            return [$tag => $attrs]
        }
    }
);

$parser->filter(\*DATA);
    # see the XML::Rules docs for ways to redirect the output to file

__DATA__
<?xml version="1.0"?>
<ResultDetail>
<results>
    <ResultItem>
        <category>AGM</category>
        <subCategory>VAL</subCategory>
        <code1>010000</code1>
        <name>parse</name>
...
[download]

Update: Please see Re^9: Sort xml based on attribute for a fixed version.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Comment on Re^3: Sort xml based on attribute Download Code

Replies are listed 'Best First'.
Re^4: Sort xml based on attribute by Anonymous Monk on Aug 12, 2010 at 11:48 UTC
It's a lot clearer now Jenda, thank you for your time and patience. I was just wondering three things: 1) where does this take the XML data from (a file? -- if so where is it specified?) 2) this line in particular $parser->filter(\*DATA); is it an output or something else? 3) last thing, I get an error when I try to run this perl script (might be because I don't quite understand the input/output of this script) `Name "main::DATA" used only once: possible typo at sort.pl line 45. <?xml version="1.0"?> not well-formed (invalid token) at line 1, column 4, byte 4 at sort.pl line 45` [download] Just off the top of my head, the languages that I'm familiar with: C, C++, Java, Python, XML (seemed relevant in this case :P) and some other things here and there (not counting webdev - doesn't seem relevant here?). Thank you again for answering my questions. I greatly appreciate it.	[reply] [d/l]
Re^5: Sort xml based on attribute by Jenda (Abbot) on Aug 12, 2010 at 12:53 UTC
Re 1) and 2): This particular script takes the data from a special filehandle DATA that allows you to read the text that follows the __DATA__ marker in the script. If you want to process a file instead either open the file and pass the filehandle: `open IN, '<', $filename or die "..."; $parser->filter(\IN);` [download] or `open my $IN, '<', $filename or die "..."; $parser->filter($IN);` [download] or use the filterfile() method `$parser->filterfile($filename);` [download] Re 3) I did not include the whole XML at the end of the script, so maybe that's where there's the problem. Drop the __DATA__ and everything after that and use the filterfile() method. There are a few posts related to XML::Rules on Perlmonks, try to find them and see if they help. I tried to explain the design of the module in some of those. For example in (RFC) XML::TransformRules, (RFC) XML::Rules - yet another XML parser and Simpler than XML::Simple. Jenda Enoch was right!* Enjoy the last years of Rome.	[reply] [d/l] [select]
Re^6: Sort xml based on attribute by Anonymous Monk on Aug 12, 2010 at 18:07 UTC
Thanks for the information. Unfortunately it doesn't seem to work. Perl doesn't give me an error, the file just seems to stay exactly the same. ORIGINAL: <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem> </results> </ResultDetail> [download] RESULT: <?xml version="1.0"?> <ResultDetail> <results><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem></results> </ResultDetail> [download] The perl script (modified only slightly) is here: #!/usr/bin/perl use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', # we want to filter (modify) the XML, not extra +ct data rules => { _default => 'raw', # we want to copy most tags intact, includi +ng the whitespace in and around them # the data of the tags will end up in the _content pseudoa +ttribute of the parent tag 'category,subCategory,code' => 'raw extended', # these three we need not only to copy, but also made easi +er to access. # The "raw extended" rule causes the data of that tag to b +e available in the hash of the parent tag # also as ":category", ":subCategory" and ":code" so you d +o not have to search through the _content array 'ResultItem' => 'as array', # we expect several <ResultItem> tags and want to store th +e data of each in an array . # the array will be accessible using the 'ResultItem' key +in the hash containing the data of the parent tag 'results' => sub { my ($tag,$attrs) = @_; # this is the Perl way to assign na +mes to subroutine/function parameters # this subroutine is called whenever the <results>...< +/results> is fully parsed and the rules # specified for the child tags evaluated. if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) +{ # if there are any <ResultItem> tags and there's more +than one @{$attrs->{ResultItem}} = sort { # sort allows you to specify the code to be us +ed to compare the items to sort # the items are made available as $a and $b to + the code. # in this case the $a and $b are hashes create +d by processing the child tags of the <ResultItem> tags. $a->{':category'} cmp $b->{':category'} or $a->{':subCategory'} cmp $b->{':subCategory'} or $a->{':code'} cmp $b->{':code'} } @{$attrs->{ResultItem}}; } $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content} +); # remove the accumulated whitespace that was present b +etween the <ResultItem> tags return [$tag => $attrs] } } ); $parser->filterfile("test.msg", "test-result.msg"); [download] I imagine that I'm being quite the pest, but I do appreciate any and all help you can give me.	[reply] [d/l] [select]
Re^7: Sort xml based on attribute by Jenda (Abbot) on Aug 12, 2010 at 19:52 UTC
Re^8: Sort xml based on attribute by mdangelo (Initiate) on Aug 12, 2010 at 23:16 UTC
Some notes below your chosen depth have not been shown here
Re^4: Sort xml based on attribute by Anonymous Monk on Aug 12, 2010 at 11:53 UTC
Sorry one other thing. Just wanted to let you know that I have actually been reading the XML::Rules documentation from http://search.cpan.org/~jenda/XML-Rules-1.10/lib/XML/Rules.pm so you don't think I'm lazily asking you for all the answers without attempting some research myself :P That being said, do you think this is the best resource for XML::Rules information?	[reply]