Ever wonder what the original ASCII committee was thinking when they decided that "+" sorts before "-"?

-QM
--
Quantum Mechanics: The dreams stuff is made of

Considered by jdporter
Unconsidered by planetscape: keep (and edit) votes prevented reaping

Replies are listed 'Best First'.
Re: ASCII Woe
by wazzuteke (Hermit) on Mar 02, 2006 at 21:53 UTC
    Personally, I don't know if I wouldn't have done the same thing. Not to say it makes any sense, but I'd simply look at a keyboard and say...

    Yup... '@' befre '#' before '$' ...

    It all makes sense after you do this:
    #!/usr/bin/perl use strict; use warnings; print "$_\n" for ( sort qw( a b c d e f g h i j k l m n o p q r s t u +v w x y z A B C D E F G H I J K L M N O P Q R S T U +V W X Y Z 1 2 3 4 5 6 7 8 9 0 ~ ! @ # $ % ^ & * ( ) _ + ` - = [ ] \ ; ' , . / < > ? : " { } | ) );
    And then look immediately at a keyboard. You'll be saying:

    Yup...

    ---hA||ta----
    print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );

      Which came first, the keyboard layout chicken or the ASCII code order egg?

      Makeshifts last the longest.

        Typewriters predate digital computers

        emc

        " When in doubt, use brute force." — Ken Thompson
Re: ASCII Woe
by swampyankee (Parson) on Mar 02, 2006 at 23:24 UTC

    EBCDIC could have won the character code wars. In which case, a regexp such as /[A-Z]/ would include all sorts of unprintable characters...

    emc

    " When in doubt, use brute force." — Ken Thompson
      Um, perl runs on several EBCDIC systems. And IIRC /[A-Z]/ is special-cased to only cover the letters.

      If you feel like having your mind blown, take a look at how UTF-EBCDIC works (yes, perl uses it).

Re: ASCII Woe
by CountZero (Bishop) on Mar 02, 2006 at 20:03 UTC
    What makes you think they were thinking at all? 'ZULU' sorts before 'ape'! Try print sort qw /ZULU ape/;

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Could have ordered the letters AaBbCc… I guess… then you could also write [A-z] in a regex. And expressing “all uppercase letters” would have been so painful to type out that everyone would be using [[:upper:]], which would make most code one step closer to internationalizable in trivial fashion at the cost of weird side effects in a minority of scripts…

      Makeshifts last the longest.

Re: ASCII Woe
by ambrus (Abbot) on Mar 02, 2006 at 20:02 UTC
Re: ASCII Woe
by Jenda (Abbot) on Mar 03, 2006 at 12:59 UTC

    Well... why do you think it matters? If you mean to sort numbers you ought to sort them numericaly not lexicaly anyway. Besides you most often do not include the + when writing positive numbers anyway.

      Ah, someone took the bait :)

      I'm not concerned for Perl, but for the next process in the pipeline. It's a data analysis/plotting program I need to feed data to. Some of the plots have a string of numbers in various places, like "25,3,18". This means something if you know that the first field is temperature (25C), the second field is attempt 3, and 18 is the day of the month.

      I thought of trying to feed strings instead of numbers, which this system happily accepts. For instance, temperature could be "T=25", and the whole label is more informative. If the label looked like "T=25,A=3,D=18", those with mathaphobia didn't waste precious time working out which is which.

      But some of the data have negative values. So "T=-25" sorts after "T=+25". Then some fields have one, two, and three digit values, so "T=10" sorts before "T=5". This can be solved with "T=05". But negative values don't work because of "+" vs. "-".

      So I thought of using "N" for negative and "P" for positive. "T=N5" sorts before "T=P5". But "T=N05" sorts before "T=N10"!

      My latest thought is to use a sequence number to order the values, because there aren't that many in each field. "0T=-10", "1T=-5", "2T=0", "3T=+5", etc. Since all values are known ahead of time, this encoding can be used to get proper ordering, and field name hints.

      Maybe in version 72, this software package will allow a better mechanism for labelling data...

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        Prefix positive numbers with a “0” instead of a “+”.

        Won’t fix the sorting within negative numbers, of course.

        Makeshifts last the longest.

        If you follow your "T=25,A=3,D=18" form, you can sort numerically easily enough. Let's say you wanted the data in order from lowest to highest temperature, and your data is in @data, with each element being a string of the above form:

        my @sorted_by_temp_asc = sort { tricky_sort($a,$b,'T') } @data; sub tricky_sort { my ($A, $B, $field) = @_; my %a_dat = map { split('=', $_, 2) } split(',',$A); my %b_dat = map { split('=', $_, 2) } split(',',$B); return ( $a_dat{$field} <=> $b_dat{$field} ); }

        The tricky_sort subroutine peforms a numerical comparison on the identified field, thus handling all the +/- sorting the way you want, and without too much twisted logic. Of course, you'll probably want to add more data validation and such before you put this into production -- after all <grail>it's only a model</grail> ;-)

        And, it probably goes without saying, but to do a descending sort, just swap the order in which you pass $a and $b to tricky_sort

        <-radiant.matrix->
        A collection of thoughts and links from the minds of geeks
        The Code that can be seen is not the true Code
        I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: ASCII Woe
by radiantmatrix (Parson) on Mar 03, 2006 at 18:46 UTC

    I have a little utility module I use when I get annoyed by the default ASCII sort. Here it is:

    This allows you to specify a subset of the default sorting order to "fix" for your own purposes. This particular example (sort + before -) could be "repaired" by:

    SortCustom::set_order('+','-'); @set = sort { csort($a,$b) } @set;

    The side-effects when making modifcations can be truly odd if you don't think carefully, so take care if using this module in production code. Here's a script that illustrates some of the ways I commonly use this utility:

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
      See my reply at Re^4: ASCII Woe...

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: ASCII Woe
by Anonymous Monk on Mar 03, 2006 at 20:04 UTC
    The ASCII codes are in the correct order. If your doing a sort of text that represents negative numbers, you have to do a reverse sort to make the numbers come out in the right order, which also makes "-" come out before "+". ;)

      That’s a nice theory, except then the positive numbers will be missorted largest-to-smallest.

      Makeshifts last the longest.

        Would you believe "convert to two's complement and sort"?

        I was just pointing out (perhaps too indirectly) that its more complicated that merely getting "+" and "-" in the right order. The direction of sort is differnt for +ve and -ve numbers.