in reply to Re: noncase-sensitive sorting
in thread noncase-sensitive sorting

Thanks for the correction. I have to admit that I am ignorent of the GRT sort. I stared at the GRT sort for a while, and finally worked out what it's doing after sticking a print inside the map. I am writing down my finding on what is happenning inside a GRT sort here for the possible benefit to other monks.

@grt_sorted = map {substr $_ => $max_l} sort map {uc () . ("\0" x ($max_l - length)) . $_} @array;
The first map expands the original string with a 0 terminated string of width $max_l, concatinated with the original string itself. Eliminates the use of a slice in the Schwartzian sort.

The sort follows only sorts the first part of the string terminated at \0.

The second map then retrieves the original string with substr.

Very clever indeed. :-D

The tricky bit is the first map. I modified it to the following to inspect what's happenning inside the map -
map { my $c = uc () . ("\0" x ($max_l - length)) . $_; print "$_ => $c\n"; $c } @array;

Replies are listed 'Best First'.
Re: noncase-sensitive sorting
by Abigail-II (Bishop) on Dec 02, 2003 at 12:46 UTC
    The sort follows only sorts the first part of the string terminated at \0.
    Eh, no, it doesn't (unless there's a bug ;-)). The sort compares the entire strings (well, up to the point they are different). The trick is that the original strings are prefixed (with a fixed length string) in such away that the prefix itself is enough to determine how to sort, unless the strings are equal (*). Strings shorter than the maximum length string are padded with NUL characters because NULs sort before any other character, and normally 'a' is sorted before 'ab'.

    In fact, a GRT and a ST are doing to same trick, but where ST uses a 2 element array, one element the original value, the other the computed value, GRT combines both values into a single string.

    (*) Well, not quite. If a string would end in a NUL (which doesn't happen in the benchmark), it would give the same prefix as a similar string without the trailing NUL. Say for instance the max length is 4, and we have two strings, "ab" and "ab\0". The both will have a "ab\0\0" prefix, but the strings used in the sort will differ: "ab\0\0ab" vs. "ab\0\0ab\0". In this case, the cmp will reach the end of the entire string, even while the strings differ. It'll go beyond the prefix.

    Abigail