That's something I've had to do in the past, so as usual it ended up in XML::Twig. The sort_children method gets called on the parent, and gets passed a function, which will be called on each child in turn. That function will return the sort criteria. The method also takes options to specify the type of sort (numeric or alpha) and the order.
This leads to the code below:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->parse( pretty_print => 'indented', shift @ARGV)
->root
->sort_children( \&get_pui, type => 'numeric')
->print;
sub get_pui
{ my( $item)= @_;
return $item->first_descendant( 'itemid[@idtype="PUI"]')->text;
}
Note that this code relies on a couple of assumptions:
- it assumes that bibdataset contains only elements to be sorted (item containing an itemid descendant with the proper attribute). If that's not the case you need to tweak get_pui to return a number, either big or small, depending where you want to put the extra elements. I believe that in recent versions of Perl if you always return the same number, then the order will be the original order in the document.
- it assumes that you can load the entire document in memory. If that is not the case you can split the document into 1 file per record and then sort then before merging them back. XML::Twig includes the xml_split tool that can do just that, but you might be better off doing it yourself, using twig_handlers to save each file under a name that includes the PUI (probably padded with 0s so lexicographic order as used by the shell works), then merging them back.
Does this help?