I’ll have to think about how to get this major point across more effectively, because I don’t seem to have done so yet.

For my part, you have got across the message that when (if), I need to sort text for human lookup, your module is the way to go.

The salient part for me is the italicised part of that sentence.

My criticism of your advertising blurb is a matter of emphasis. It is that you (way) over-emphasise the frequency that dictionary ordering is an important part of the use of sort. It pretty much completely ignores the many algorithmic uses of sorting.


As for my (lack of) use of Unicode. For the most part, I do not have any need of it.

Further, I find that the emphasis to embrace Unicode is misbegotten. The lingua-franca of computing (and science in general) is English. If you are a biologist, then you need to have a working knowledge of latin in order to be able to understand and interpret (pronounce and remember:) the biological classification system. If you are a musician, you pretty much need to be able to read music to be able to communicate with other musicians.

And if you are a computer scientist, you need to be able to read and write in English in order to be able to use resources like the the IETF RFCs. They manage to express a whole heap of very complex ideas using nothing more than 7-bit ascii. Even if you translated them into all the world's languages, the task of verifying that they were all in technical accordance would be impossible.

Whilst with the advent of the consumerist WWW and global markets, programs need to be able to deal with the full range of the world's writing systems, programs should for the most part, treat non-ascii text -- names, addresses etc -- as opaque binary packets to be received from the user and presented back without analysis or translation.

In contrast to your assertions above, IMO, Unicode is not text. (How could you possibly sort Chinese, Japanese, Thai, Russian, Arabic and Farsi names into your schema?). It is a set of (incompatible) binary standards. And very bad ones at that.

The absence of any mechanism to determine if a file of data is Unicode; and if it is, which of the many forms of Unicode it might be; is frankly ludicrous. It is like taking all the image file formats and stripping out their type headers.

Unicode is a mess. A (set of) kludged together, interim solutions that have been promoted to a (set of) standards that should never have been. Far worse (IMO) than the code-page mechanism.

Whilst I am in awe of your efforts to make sense of the whole mess and to render it vaguely usable from Perl, in the long term I think that such efforts (across the industry rather yours specifically), may be counter productive. The problem is, that with usability -- even as limited as it is -- comes longevity. Which means that better solutions will not be sought, much less adopted.

Do a search in your favourite search engine for unicode wrong to see the mess that is Unicode. Anything that is that easy to get wrong should have been allowed to die a natural death.

As with many previous bad fads -- pet rocks, glassless glasses, bell-bottom trousers and Y2K hysteria -- my apparently lone voice will be seen as swimming against the flow, but don't forget which direction the survivors in the The Poseiden Adventure went in :)

History will tell.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^6: best sort by BrowserUk
in thread best sort by ag4ve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.