in reply to Re^2: Sort xml based on attribute
in thread Sort xml based on attribute

The code will be a bit simpler, but whether it will be any easier to understand I don't know. What language(s) do you have experience with?

use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', # we want to filter (modify) the XML, not extra +ct data rules => { _default => 'raw', # we want to copy most tags intact, includi +ng the whitespace in and around them # the data of the tags will end up in the _content pseudoa +ttribute of the parent tag 'category,subCategory,code1' => 'raw extended', # these three we need not only to copy, but also made easi +er to access. # The "raw extended" rule causes the data of that tag to b +e available in the hash of the parent tag # also as ":category", ":subCategory" and ":code" so you d +o not have to search through the _content array 'ResultItem' => 'as array', # we expect several <ResultItem> tags and want to store th +e data of each in an array . # the array will be accessible using the 'ResultItem' key +in the hash containing the data of the parent tag 'results' => sub { my ($tag,$attrs) = @_; # this is the Perl way to assign na +mes to subroutine/function parameters # this subroutine is called whenever the <results>...< +/results> is fully parsed and the rules # specified for the child tags evaluated. if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) +{ # if there are any <ResultItem> tags and there's more +than one @{$attrs->{ResultItem}} = sort { # sort allows you to specify the code to be us +ed to compare the items to sort # the items are made available as $a and $b to + the code. # in this case the $a abd $b are hashes create +d by processing the child tags of the <ResultItem> tags. $a->{':category'} cmp $b->{':category'} or $a->{':subCategory'} cmp $b->{':subCategory'} or $a->{':code1'} cmp $b->{':code1'} } @{$attrs->{ResultItem}}; } $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content} +); # remove the accumulated whitespace that was present b +etween the <ResultItem> tags return [$tag => $attrs] } } ); $parser->filter(\*DATA); # see the XML::Rules docs for ways to redirect the output to file __DATA__ <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> ...

Update: Please see Re^9: Sort xml based on attribute for a fixed version.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^4: Sort xml based on attribute
by Anonymous Monk on Aug 12, 2010 at 11:48 UTC
    It's a lot clearer now Jenda, thank you for your time and patience.

    I was just wondering three things:

    1) where does this take the XML data from (a file? -- if so where is it specified?)
    2) this line in particular $parser->filter(\*DATA); is it an output or something else?
    3) last thing, I get an error when I try to run this perl script (might be because I don't quite understand the input/output of this script)
    Name "main::DATA" used only once: possible typo at sort.pl line 45. <?xml version="1.0"?> not well-formed (invalid token) at line 1, column 4, byte 4 at sort.pl line 45
    Just off the top of my head, the languages that I'm familiar with: C, C++, Java, Python, XML (seemed relevant in this case :P) and some other things here and there (not counting webdev - doesn't seem relevant here?).

    Thank you again for answering my questions. I greatly appreciate it.

      Re 1) and 2): This particular script takes the data from a special filehandle DATA that allows you to read the text that follows the __DATA__ marker in the script. If you want to process a file instead either open the file and pass the filehandle:

      open IN, '<', $filename or die "..."; $parser->filter(\*IN);
      or
      open my $IN, '<', $filename or die "..."; $parser->filter($IN);
      or use the filterfile() method
      $parser->filterfile($filename);

      Re 3) I did not include the whole XML at the end of the script, so maybe that's where there's the problem. Drop the __DATA__ and everything after that and use the filterfile() method.

      There are a few posts related to XML::Rules on Perlmonks, try to find them and see if they help. I tried to explain the design of the module in some of those. For example in (RFC) XML::TransformRules, (RFC) XML::Rules - yet another XML parser and Simpler than XML::Simple.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        Thanks for the information. Unfortunately it doesn't seem to work. Perl doesn't give me an error, the file just seems to stay exactly the same.

        ORIGINAL:
        <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem> </results> </ResultDetail>
        RESULT:
        <?xml version="1.0"?> <ResultDetail> <results><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem><ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem></results> </ResultDetail>
        The perl script (modified only slightly) is here:
        #!/usr/bin/perl use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', # we want to filter (modify) the XML, not extra +ct data rules => { _default => 'raw', # we want to copy most tags intact, includi +ng the whitespace in and around them # the data of the tags will end up in the _content pseudoa +ttribute of the parent tag 'category,subCategory,code' => 'raw extended', # these three we need not only to copy, but also made easi +er to access. # The "raw extended" rule causes the data of that tag to b +e available in the hash of the parent tag # also as ":category", ":subCategory" and ":code" so you d +o not have to search through the _content array 'ResultItem' => 'as array', # we expect several <ResultItem> tags and want to store th +e data of each in an array . # the array will be accessible using the 'ResultItem' key +in the hash containing the data of the parent tag 'results' => sub { my ($tag,$attrs) = @_; # this is the Perl way to assign na +mes to subroutine/function parameters # this subroutine is called whenever the <results>...< +/results> is fully parsed and the rules # specified for the child tags evaluated. if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) +{ # if there are any <ResultItem> tags and there's more +than one @{$attrs->{ResultItem}} = sort { # sort allows you to specify the code to be us +ed to compare the items to sort # the items are made available as $a and $b to + the code. # in this case the $a and $b are hashes create +d by processing the child tags of the <ResultItem> tags. $a->{':category'} cmp $b->{':category'} or $a->{':subCategory'} cmp $b->{':subCategory'} or $a->{':code'} cmp $b->{':code'} } @{$attrs->{ResultItem}}; } $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content} +); # remove the accumulated whitespace that was present b +etween the <ResultItem> tags return [$tag => $attrs] } } ); $parser->filterfile("test.msg", "test-result.msg");
        I imagine that I'm being quite the pest, but I do appreciate any and all help you can give me.
Re^4: Sort xml based on attribute
by Anonymous Monk on Aug 12, 2010 at 11:53 UTC
    Sorry one other thing.

    Just wanted to let you know that I have actually been reading the XML::Rules documentation from http://search.cpan.org/~jenda/XML-Rules-1.10/lib/XML/Rules.pm so you don't think I'm lazily asking you for all the answers without attempting some research myself :P

    That being said, do you think this is the best resource for XML::Rules information?