in reply to Inconsistent column Schwartzian Transform attempt

It gets simpler if you normalize the data in the first step.

#!/usr/bin/perl -w use strict; my @unordered = ( '12 Corinthians 5:10', 'Hebrews 11:15', '1 Corinthians 13:23', '2 Corinthians 1:3', 'John 3:16', '1 Corinthians 12:10', '1 Corinthians 2:10', '1 Corinthians 2:10' ); my @ordered = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, sprintf("%05d %-20s %04d %04d",/^\d/?():(99999), split(/[ +:]/)) ] } @unordered; print join "\n", @ordered; exit;

By adding the

/^\d/?():(99999)

and sprintf'ing the string so it compares lexicographically, you're in good shape, I think.

If you had a huge set of data to work with, you might want to do it this way instead, which I think might be faster (and might be the GR transform, I'm not sure):

# ... my @ordered = map { (split /\t/)[1] } sort { $a cmp $b } map { sprintf("%05d %-20s %04d %04d", /^\d/?():(99999), split(/[ :]/)) ."\t$_" } @unordered; # ...

Make sense?
--
Mike

(Edit: removed debugging prints from sort block)

Replies are listed 'Best First'.
And lexicographically means...
by RMGir (Prior) on Mar 16, 2002 at 16:36 UTC
    ...make sure you sprintf numbers with %0xxd, where the xx is enough space for the widest number in that column, and make sure you format strings with %-yys where yy is enough space for the widest string in that column.

    That way, the fields will "cmp" correctly.

    If you need keys sorted in a different order, just sprintf them differently. For descending numeric keys, sprintf 9999999-$fld. For descending string keys... Hmmm, don't have an easy answer to that one. Of course, as long as you don't need some string keys desc and others asc, you can just reverse the order of the cmp call in the sort block.

    Hope this helps!
    --
    Mike

      If you need keys sorted in a different order, just sprintf them differently. For descending numeric keys, sprintf 9999999-$fld. For descending string keys... Hmmm, don't have an easy answer to that one.

      # Ascending sort { $a cmp $b } LIST sort { $a <=> $b } LIST # Descending sort { $b cmp $a } LIST sort { $b <=> $a } LIST
      I think using 9999999-$foo is an ugly way to do a descending sort, and I think sprintfing to sort numbers using cmp is evil, because there's already the <=> operator and you can use every routine you want for sorting - just reverse it if you want a descending sort.

      U28geW91IGNhbiBhbGwgcm90MTMgY
      W5kIHBhY2soKS4gQnV0IGRvIHlvdS
      ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
      geW91IHNlZSBpdD8gIC0tIEp1ZXJk
      

        I'm building up the string key using sprintf just to avoid perl code in the comparison function.

        Evil yes. But also EFFICIENT.

        I'll bet for a long sort, it's probably faster to do a cmp than several cmps and <=>.

        But it's the weekend; I'm too lazy to fire up Benchmark. :)
        --
        Mike

      For descending string keys sprintf ~$fld