in reply to Re: Sort problem
in thread Sort problem

Thanks dws. First for seeing past my clumbsiness:)

Second catching the significance of embedded spaces.

Third for showing me where I was going wrong. The nearest I had got was

print "@$_" for sort{ my ($o, $i) = (0,0); $i++ until ($i < @$a or $i < @$b) and $o =($a->[$i] cmp $b->[$i]) or $o = @$a <=> @$b; $o; }@deps;
Which gave me

C B A C G F H C G F E D C G F H I C G F H J M K M N M N Q P O U S V T
Close, but no cigar. qw[C G F H] was sorting above qw[C G F E D] but for the life of me I couldn't see why.

Your code made me realise what was wrong and led to this

print "@$_" for sort{ my ($o, $i) = (0,0); $i++ until ( $i < @$a or $i < @$b ) and $o=( $a->[$i] cmp $b->[$i] ); $o || @$a <=> @$b; }@deps;

Which I realise won't win any prices in the clarity-at-all-costs stakes, but I find readable.


Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Replies are listed 'Best First'.
Re: Re: Re: Sort problem
by tachyon (Chancellor) on Feb 26, 2003 at 20:55 UTC

    Why doesn't this work out of interest? Internal spaces are retained but the concat does not add spaces....

    { local $" = ''; @out = sort { "@$a" cmp "@$b" } @in; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Okay. I got caught by this a few weeks ago.

      If you have ['A','BC'] being compared against ['AB','C'] at some point within the sort, then once concatenated, they compare as equal rather than the former being earlier lexically than the latter.

      Equally, I have used various separators in the past, control characters (ord(0-31)), del (ord(127)) etc., but the advent of utf8 means that individual bytes of a multi-byte char can legitimately hold these chars, so using them as a separator is no longer viable. (Some would say it never was :).

      The only alternative I have found is using a combination of 0xBF0xBE as a seperator. This sequence can never legitimately appear in utf-8 (I believe), but I am not yet confident I have understood the unicode stuff enough to be certain.


      ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.
      Because now you are making ['thecat'] the same as ['the', 'cat'] (where the second should sort before the first). If we know we're dealing with plain text, then joining with a null character would be ok. If its unicode, I'm not sure...

      Another map sort map solution would be to use sprintf to concatenate the elements using fixed lengths, but that would require first knowing what the maximum length of any field could be, and making sure your "%Ns" format is at least that large.

Re^3: Sort problem
by hv (Prior) on Feb 27, 2003 at 00:17 UTC

    I think the new code you have above gives 'use of uninitialized value' warnings because the array-length check is not quite right. I think it needs instead:

    $i++ until $i >= @$a or $i >= @$b or $o = ( $a->[$i] cmp $b->[$i] );

    $i == @$a would be just as good as $i >= @$a; I can't decide which is clearer.

    Hugo

      Your right. I had it coded as a while loop

      $i++ while ( $i < @$a and $i < @$b ) and not $o=( $a->[$i] cmp $b->[$i] );

      but decided as I was posting it, that it was better done as an until loop as the conditional was slightly clearer.

      Unfortuantely, for the second time yesterday in the same thread I screwed up the conversion.


      ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.