texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I need to sort the lines that start with "<ref" and with in the <address>...</address> range. The text outside <address>...</address> should remain untouched. But my present code sorts the entire file. please help me in fixing this.
$/ = "</address>\n"; while ( <DATA> ) { chomp; print "$_\n" for sort { $a cmp $b} split /\n/; print $/ if ! eof DATA; } __DATA__ xxx data aaa data <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e2"/> <ref refid="e4"/> <ref refid="aff4">4</ref> </address> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e4"/> <ref refid="aff3">3</ref> </address> xxx data aaa data
desired output:
xxx data aaa data <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="aff4">4</ref> <ref refid="e2"/> <ref refid="e4"/> </address> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="aff3">3</ref> <ref refid="e4"/> </address> xxx data aaa data
Thanks in advance

Replies are listed 'Best First'.
Re: difficulty in sorting
by GrandFather (Saint) on Aug 21, 2008 at 05:10 UTC

    It looks like you are dealing with XML. It's generally a bad idea to try and hand parse XML. However there are some really good modules around to help. Consider this XML::Twig solution:

    use strict; use warnings; use XML::Twig; my $xml = <<XML; <data> <ignore> This should be ignored. </ignore> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e2"/> <ref refid="e4"/> <ref refid="aff4">4</ref> </address> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e4"/> <ref refid="aff3">3</ref> </address> </data> XML my $twig = XML::Twig->new ( twig_roots => { 'address' => \&sortAddress, }, twig_print_outside_roots => 1, pretty_print => 'indented', ); $twig->parse ($xml); sub sortAddress { my ($t, $address)= @_; my @children = $address->cut_children (); my @untouchables = grep {$_->tag () ne 'ref'} @children; my @sorted = sort {$a->att ('refid') cmp $b->att ('refid')} grep {$_->tag () eq 'ref'} @children; $_->paste (last_child => $address) for @untouchables; $_->paste (last_child => $address) for @sorted; $address->print (); }

    Prints:

    <data> <ignore> This should be ignored. </ignore> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="aff4">4</ref> <ref refid="e2"/> <ref refid="e4"/> </address> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="aff3">3</ref> <ref refid="e4"/> </address> </data>

    Perl reduces RSI - it saves typing
      GrandFather, Thanks for your wonderful support. unfortunately my machine is having some problem with Twig module. I got to fix it first or may be i will upgrade my indigoperl.
Re: difficulty in sorting
by Tanktalus (Canon) on Aug 21, 2008 at 05:04 UTC

    I find changing $/ to be too confusing. That's not to say that it can't be used to good effect, only that it's easier to read usually when you don't. So, here goes:

    #! /usr/bin/perl -w use strict; my @sortable; while ( <DATA> ) { # are we in the ref's? if (/^\<ref/) { # keep track of it, but don't print it out yet. push @sortable, $_; next; } # else, are we done ref's? if (@sortable) { print for sort @sortable; # done with them, get rid of 'em. @sortable = (); } # this line can be printed out at this point. print; } __DATA__ [ same as yours, so I'm not repeating it ]
    It seems to get the job done. Of course, I'm assuming that all the ref lines are in a row.

      Tanktalus, Thanks for your code and the instructions are great.
      As i mentioned earlier, I need to sort the lines that start with "<ref" and with in the <address>...</address> range. Though your code sorts, it is not checking the range <address>...</address>. i.e. if the data contains
      <ref refid="aff2">2</ref> <ref refid="aff1">1</ref> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e2"/> <ref refid="e4"/> <ref refid="aff4">4</ref> </address>
      the first two "<ref"s outside <address> is also getting sorted.
        It sounds like all you need is a minor tweak to Tanktalus's code then:
        #! /usr/bin/perl -w use strict; my @sortable; my $inAddrRange=0; while ( <DATA> ) { # are we starting an <address> range? $inAddrRange=1 if /<address>/; # are we ending a range with </address>? $inAddrRange=0 if m{</address>}; # are we in the ref's AND in an addr range? if ($inAddrRange && /^\<ref/) { # keep track of it, but don't print it out yet. push @sortable, $_; next; } # else, are we done ref's? if (@sortable) { print for sort @sortable; # done with them, get rid of 'em. @sortable = (); } # this line can be printed out at this point. print; } __DATA__ [ same as yours, so I'm not repeating it ]
        I haven't tested this, but I think it'll do what you want...

        Mike
Re: difficulty in sorting
by juster (Friar) on Aug 21, 2008 at 06:29 UTC

    I started this before the last two comments, it took me awhile using Data::Dumper to figure it out. Here is another using XML::Simple but I'll bet it's slower than Twig!

    #!/usr/bin/perl use warnings; use strict; use XML::Simple; my $xmlstring = do { local $/; <DATA> }; my $xs = new XML::Simple; my $xml = $xs->XMLin($xmlstring); foreach(@{$xml->{address}}) { my @tmp = sort { $a->{refid} cmp $b->{refid} } @{$_->{ref}}; $_->{ref} = \@tmp; } print $xs->XMLout($xml); __DATA__ <opt> xxxdata <address> <state>address1</state> <ref refid="aaf1">1</ref> <ref refid="AAF1">2</ref> <ref refid="e2"/> <ref refid="e4"/> <ref refid="aff4">4</ref> </address> <ref refid="zzz9">2</ref> <ref refid="aff1">1</ref> <address> <state>address1</state> <ref refid="aff1">1</ref> <ref refid="aff2">2</ref> <ref refid="e4"/> <ref refid="aff3">3</ref> </address> </opt>
Re: difficulty in sorting
by massa (Hermit) on Aug 21, 2008 at 10:10 UTC
    What you want is:
    my @tosort; while( <DATA> ) { if( m|<address>| .. m|</address>| and not m|</?address>| ) { push @tosort, $_; next } print for sort splice @tosort; print }
    Ah, the awk  .. operator.... :-P
    []s, HTH, Massa (κς,πμ,πλ)