prasadbabu has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am doing a sorting program for index. In the index, i have many section levels, in that i want to do alphabetical sorting for the text part.

The problem is, the tertiary levels 004 and 005 should be sorted separately and tertiatry levels 007,008 and 009 should be sorted separately because the above tertiary levels are present in different secondary levels.

How can i proceed and which module can be used for this, shall i use sort::Natural for this.

The following are the input and output of the index.

Input:

<indexEntry> <primaryIE id="001">sloshing, definition </primaryIE> <secondaryIE id="002">solitons </secondaryIE> <secondaryIE id="003">boundary conditions, Dirichlet </secondaryIE> <tertiaryIE id="004">Cauchy condition </tertiaryIE> <tertiaryIE id="005">asymmetric modes </tertiaryIE> <secondaryIE id="006">natural frequencies </secondaryIE> <tertiaryIE id="007">boundary conditions, Neumann </tertiaryIE> <tertiaryIE id="008">canals, circular </tertiaryIE> <tertiaryIE id="009">baffle, annular </tertiaryIE> <secondaryIE id="010">annular sector </secondaryIE> <primaryIE id="011">mode shapes </primaryIE> </indexEntry>

Output:

<indexEntry> <primaryIE id="001">mode shapes </primaryIE> <secondaryIE id="002">annular sector </secondaryIE> <secondaryIE id="003">boundary conditions, Dirichlet </secondaryIE> <tertiaryIE id="004">asymmetric modes </tertiaryIE> <tertiaryIE id="005">Cauchy condition </tertiaryIE> <secondaryIE id="006">natural frequencies </secondaryIE> <tertiaryIE id="007">baffle, annular </tertiaryIE> <tertiaryIE id="008">boundary conditions, Neumann </tertiaryIE> <tertiaryIE id="009">canals, circular </tertiaryIE> <secondaryIE id="010">solitons </secondaryIE> <primaryIE id="011">sloshing, definition </primaryIE> </indexEntry>

Can anyone give some suggestions, to proceed.

Thanks in advance

Prasad

Replies are listed 'Best First'.
Re: Index sorting
by CountZero (Bishop) on Feb 22, 2005 at 12:31 UTC
    I don't really understand the way you want to sort the index.

    It seems that you also want to renumber the ID's and keep in a certain way the "primary secondary tertiary" structure but not always.

    Perhaps if you explain what kind of index this is and how it is made, we might understand better.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Index sorting
by RazorbladeBidet (Friar) on Feb 22, 2005 at 13:19 UTC
    Ok, looks like secondary IE's are not bound to primary, but tertiary's are bound to secondary? If so, you need to read in each primary and secondary. When you find a tertiary, you'll want to have an array (or similar) tied to that corresponding secondaryIE. Now you can sort each independently. It's more about reading in and data structures than sorting. For instance:

    Semi-Pseudo-Untested-code:
    while ( <FILE> ) { if ( isPrimary() ) { push @primary, $_; } elsif ( isSecondary() ) { push @secondary, $_; } elsif ( isTertiary() ) { push @{$secondary[$#secondary]}, $_; } } @primary = sort { $a cmp $b } @primary; @secondary = sort { $a cmp $b } @secondary; foreach $secondaryItem ( @secondary ) { @{$secondaryItem} = sort { $a cmp $b } @$secondaryItem; }


    Something like that - anyways, don't forget to use strict :)
    --------------
    It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs
Re: Index sorting
by jdporter (Paladin) on Feb 22, 2005 at 17:27 UTC
    use strict; use warnings; my %ie; my @id; while (<>) { if ( m#<indexEntry># ) # open { print; %ie = (); @id = (); } elsif ( m#</indexEntry># ) # close { # sort the primaries, secondaries, and tertiaries: for my $l ( keys %ie ) { @{$ie{$l}} = sort @{$ie{$l}}; } # output the munged record: for ( @id ) { my( $id, $level, $iei ) = @$_; print qq(<${level}IE id="$id">$ie{$level}[$iei] </${level} +IE>\n); } print; } elsif ( m#<(\w+)IE id="(\d+)">(.*)</\1[I]E># ) { # collect the row of data: my( $level, $id, $text ) = ( $1, $2, $3 ); push @{$ie{$level}}, $text; push @id, [ $id, $level, $#{$ie{$level}} ]; } # else, invalid line of input. }
    Assumes the data can be munged through a filter. If not, alter for input and output as appropriate.
Re: Index sorting
by jdporter (Paladin) on Feb 22, 2005 at 16:07 UTC
    Hmm. I thought I understood what you're trying to achieve, but then I noticed that you have the tertiaries order thusly in the desired output:
    asymmetric modes Cauchy condition baffle, annular boundary conditions, Neumann canals, circular
    Why is "Cauchy" in that position? Or was that merely a mistake?
Re: Index sorting
by samizdat (Vicar) on Feb 22, 2005 at 12:37 UTC
    This is a case for normal Perl, not a module, IMHO. :) I leave it to you to update the ID's once it's sorted, but how about this:
    sub by_level { my $tempa =~ s/^<.*">//; $tempa =~ s/<\/.*>//; my $tempb =~ s/^<.*">//; $tempb =~ s/<\/.*>//; substr($a,0,2) cmp substr($b,0,2) || $tempa cmp $tempb; }
    Update: added " to first $tempb substitution. Untested, but I think this does what you want.