That's something I've had to do in the past, so as usual it ended up in XML::Twig. The sort_children method gets called on the parent, and gets passed a function, which will be called on each child in turn. That function will return the sort criteria. The method also takes options to specify the type of sort (numeric or alpha) and the order.
This leads to the code below:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->parse( pretty_print => 'indented', shift @ARGV)
->root
->sort_children( \&get_pui, type => 'numeric')
->print;
sub get_pui
{ my( $item)= @_;
return $item->first_descendant( 'itemid[@idtype="PUI"]')->text;
}
Note that this code relies on a couple of assumptions:
- it assumes that bibdataset contains only elements to be sorted (item containing an itemid descendant with the proper attribute). If that's not the case you need to tweak get_pui to return a number, either big or small, depending where you want to put the extra elements. I believe that in recent versions of Perl if you always return the same number, then the order will be the original order in the document.
- it assumes that you can load the entire document in memory. If that is not the case you can split the document into 1 file per record and then sort then before merging them back. XML::Twig includes the xml_split tool that can do just that, but you might be better off doing it yourself, using twig_handlers to save each file under a name that includes the PUI (probably padded with 0s so lexicographic order as used by the shell works), then merging them back.
Does this help?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.