in reply to Sort problem

What you've shown isn't an Array of Arrays. But let's assume that's a mistranscription. Something like this might do what you need.
use strict; use Data::Dumper; my @in = ( [qw(M N)], [qw(M N P O)], [qw(C B A)], [qw(U S)], [qw(V T)], [qw(C G F E D)], [qw(C G F H)], [qw(C G F H I)], [qw(C G F H J)], [qw(M K)] ); sub arrayCompare { my @a = @$a; my @b = @$b; while ( 1 ) { return 0 if @a ==0 && @b == 0; return -1 if @a == 0; return 1 if @b == 0; my $cmp = $a[0] cmp $b[0]; return $cmp if $cmp != 0; shift @a; shift @b; } } my @out = sort arrayCompare @in; print Dumper @out;

(Side comment: I'm surprised at how many people missed the significance of "once the elements can be strings that could contain embedded spaces" and tried approaches based on concatenation.)

Replies are listed 'Best First'.
Re: Re: Sort problem
by BrowserUk (Patriarch) on Feb 26, 2003 at 20:46 UTC

    Thanks dws. First for seeing past my clumbsiness:)

    Second catching the significance of embedded spaces.

    Third for showing me where I was going wrong. The nearest I had got was

    print "@$_" for sort{ my ($o, $i) = (0,0); $i++ until ($i < @$a or $i < @$b) and $o =($a->[$i] cmp $b->[$i]) or $o = @$a <=> @$b; $o; }@deps;
    Which gave me

    C B A C G F H C G F E D C G F H I C G F H J M K M N M N Q P O U S V T
    Close, but no cigar. qw[C G F H] was sorting above qw[C G F E D] but for the life of me I couldn't see why.

    Your code made me realise what was wrong and led to this

    print "@$_" for sort{ my ($o, $i) = (0,0); $i++ until ( $i < @$a or $i < @$b ) and $o=( $a->[$i] cmp $b->[$i] ); $o || @$a <=> @$b; }@deps;

    Which I realise won't win any prices in the clarity-at-all-costs stakes, but I find readable.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.

      Why doesn't this work out of interest? Internal spaces are retained but the concat does not add spaces....

      { local $" = ''; @out = sort { "@$a" cmp "@$b" } @in; }

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Okay. I got caught by this a few weeks ago.

        If you have ['A','BC'] being compared against ['AB','C'] at some point within the sort, then once concatenated, they compare as equal rather than the former being earlier lexically than the latter.

        Equally, I have used various separators in the past, control characters (ord(0-31)), del (ord(127)) etc., but the advent of utf8 means that individual bytes of a multi-byte char can legitimately hold these chars, so using them as a separator is no longer viable. (Some would say it never was :).

        The only alternative I have found is using a combination of 0xBF0xBE as a seperator. This sequence can never legitimately appear in utf-8 (I believe), but I am not yet confident I have understood the unicode stuff enough to be certain.


        ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

        Examine what is said, not who speaks.
        1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
        2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
        3) Any sufficiently advanced technology is indistinguishable from magic.
        Arthur C. Clarke.
        Because now you are making ['thecat'] the same as ['the', 'cat'] (where the second should sort before the first). If we know we're dealing with plain text, then joining with a null character would be ok. If its unicode, I'm not sure...

        Another map sort map solution would be to use sprintf to concatenate the elements using fixed lengths, but that would require first knowing what the maximum length of any field could be, and making sure your "%Ns" format is at least that large.

      I think the new code you have above gives 'use of uninitialized value' warnings because the array-length check is not quite right. I think it needs instead:

      $i++ until $i >= @$a or $i >= @$b or $o = ( $a->[$i] cmp $b->[$i] );

      $i == @$a would be just as good as $i >= @$a; I can't decide which is clearer.

      Hugo

        Your right. I had it coded as a while loop

        $i++ while ( $i < @$a and $i < @$b ) and not $o=( $a->[$i] cmp $b->[$i] );

        but decided as I was posting it, that it was better done as an until loop as the conditional was slightly clearer.

        Unfortuantely, for the second time yesterday in the same thread I screwed up the conversion.


        ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

        Examine what is said, not who speaks.
        1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
        2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
        3) Any sufficiently advanced technology is indistinguishable from magic.
        Arthur C. Clarke.
Re: Re: Sort problem
by tachyon (Chancellor) on Feb 26, 2003 at 19:47 UTC

    Just concatenate and remove the spaces and throw in a Schwartzian Yransform for efficiency....like this

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Just concatenate and remove the spaces and ...

      You and I have apparently come to quite different understandings of the problem BrowserUK posed. He left a few details out, but I assume that he's illustrating the shape of the data, and is providing a valuable clue to the actual content of the data when he writes "once the elements can be strings that could contain embedded spaces".

      If my assumption is true, then approaches that use concatenation will fail for some inputs, notably some inputs that contain embedded spaces.

      Perhaps this is a good opportunity for BrowserUK to clarify his intent.

        The example I posted uses concatenation but removes the spaces \040 and thus the problem.

        @in = ( ['the cat', 'sat', 'on', 'the', 'mat'], ['the cat sat', 'wherever if felt like'], + ['the cat'], ); my @out = map{$_->[0]} sort{$a->[1] cmp $b->[1]} map{[$_, concat($_ +)]} @in; sub concat { my $ary = shift; $ary = join '', @$ary; $ary =~ s/\s//g; return $ary; } use Data::Dumper; $Data::Dumper::Indent = 0; print map {s/\[/\n \[/g; $_} Dumper \@out; __DATA__ $VAR1 = [ ['the cat'], ['the cat','sat','on','the','mat'], ['the cat sat','wherever if felt like']];

        Please explain to me why this is not appropriately sorted and how it is failing? It seemed clear enough to me that he did not want 'the cat' to sort before ('the', 'cat') simply because char 4 of 'the cat' is \040. If I misread the problem and he wants to sort including embeded spaces but not getting a space when you concat with "@ary" then all is required is a local $" = ''; before the sort.

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print