in reply to Re: Sort xml based on attribute
in thread Sort xml based on attribute

Sorry to be a pest Jenda, but I'm trying to modify this script for my own use but I've only just started learning perl so much of this is completely new to me. Could you explain what each bit does if it's not too much trouble? Or explain how I could modify it to sort an XML file such as this based on category, subcategory and then code1? Sorry to ask so much but I really am at a loss.
<?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Parse error</description> <cause>Parse error in the input XML</cause> <action>Correct the error and send your request again</action> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010300</code1> <name>client.NotEntered</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not entered</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010400</code1> <name>client.notFound</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client not found</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010500</code1> <name>client.invalidData</name> <type>ERR</type> <flags>320</flags> <language>EN</language> <description>Client data invalid</description> <cause>Invalid data field values</cause> <action>Correct the problem and send the request again</action +> </ResultItem> </results> </ResultDetail>
Note: It's actually a much much larger file (9510 lines)

Replies are listed 'Best First'.
Re^3: Sort xml based on attribute
by Jenda (Abbot) on Aug 12, 2010 at 10:20 UTC

    The code will be a bit simpler, but whether it will be any easier to understand I don't know. What language(s) do you have experience with?

    use strict; use warnings; no warnings 'uninitialized'; use XML::Rules; my $parser = XML::Rules->new( style => 'filter', # we want to filter (modify) the XML, not extra +ct data rules => { _default => 'raw', # we want to copy most tags intact, includi +ng the whitespace in and around them # the data of the tags will end up in the _content pseudoa +ttribute of the parent tag 'category,subCategory,code1' => 'raw extended', # these three we need not only to copy, but also made easi +er to access. # The "raw extended" rule causes the data of that tag to b +e available in the hash of the parent tag # also as ":category", ":subCategory" and ":code" so you d +o not have to search through the _content array 'ResultItem' => 'as array', # we expect several <ResultItem> tags and want to store th +e data of each in an array . # the array will be accessible using the 'ResultItem' key +in the hash containing the data of the parent tag 'results' => sub { my ($tag,$attrs) = @_; # this is the Perl way to assign na +mes to subroutine/function parameters # this subroutine is called whenever the <results>...< +/results> is fully parsed and the rules # specified for the child tags evaluated. if ($attrs->{ResultItem} and @{$attrs->{ResultItem}} > 1) +{ # if there are any <ResultItem> tags and there's more +than one @{$attrs->{ResultItem}} = sort { # sort allows you to specify the code to be us +ed to compare the items to sort # the items are made available as $a and $b to + the code. # in this case the $a abd $b are hashes create +d by processing the child tags of the <ResultItem> tags. $a->{':category'} cmp $b->{':category'} or $a->{':subCategory'} cmp $b->{':subCategory'} or $a->{':code1'} cmp $b->{':code1'} } @{$attrs->{ResultItem}}; } $attrs->{_content} =~ s/^\s+// if (!ref $attrs->{_content} +); # remove the accumulated whitespace that was present b +etween the <ResultItem> tags return [$tag => $attrs] } } ); $parser->filter(\*DATA); # see the XML::Rules docs for ways to redirect the output to file __DATA__ <?xml version="1.0"?> <ResultDetail> <results> <ResultItem> <category>AGM</category> <subCategory>VAL</subCategory> <code1>010000</code1> <name>parse</name> ...

    Update: Please see Re^9: Sort xml based on attribute for a fixed version.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      It's a lot clearer now Jenda, thank you for your time and patience.

      I was just wondering three things:

      1) where does this take the XML data from (a file? -- if so where is it specified?)
      2) this line in particular $parser->filter(\*DATA); is it an output or something else?
      3) last thing, I get an error when I try to run this perl script (might be because I don't quite understand the input/output of this script)
      Name "main::DATA" used only once: possible typo at sort.pl line 45. <?xml version="1.0"?> not well-formed (invalid token) at line 1, column 4, byte 4 at sort.pl line 45
      Just off the top of my head, the languages that I'm familiar with: C, C++, Java, Python, XML (seemed relevant in this case :P) and some other things here and there (not counting webdev - doesn't seem relevant here?).

      Thank you again for answering my questions. I greatly appreciate it.

        Re 1) and 2): This particular script takes the data from a special filehandle DATA that allows you to read the text that follows the __DATA__ marker in the script. If you want to process a file instead either open the file and pass the filehandle:

        open IN, '<', $filename or die "..."; $parser->filter(\*IN);
        or
        open my $IN, '<', $filename or die "..."; $parser->filter($IN);
        or use the filterfile() method
        $parser->filterfile($filename);

        Re 3) I did not include the whole XML at the end of the script, so maybe that's where there's the problem. Drop the __DATA__ and everything after that and use the filterfile() method.

        There are a few posts related to XML::Rules on Perlmonks, try to find them and see if they help. I tried to explain the design of the module in some of those. For example in (RFC) XML::TransformRules, (RFC) XML::Rules - yet another XML parser and Simpler than XML::Simple.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

      Sorry one other thing.

      Just wanted to let you know that I have actually been reading the XML::Rules documentation from http://search.cpan.org/~jenda/XML-Rules-1.10/lib/XML/Rules.pm so you don't think I'm lazily asking you for all the answers without attempting some research myself :P

      That being said, do you think this is the best resource for XML::Rules information?