The problem is that the legacy set makes use of the extended ascii character set (8-bit chars) which don't convert to Unicode (easily).
Hmm... why not?
My take when asked about it was: don't! Keep two lists for lookup and don't mix them, because they cannot logically be sorted together. They countered by sorting two small subsets together (using Java) and saying that it was easier for their people to do lookups in a single list.
Well, that doesn't look too difficult? Why not decode their legacy set (as in map Encode::decode( 'LEGACY_SET', $_ ), @set) and sort that? And if their set happens to be ISO-8859-1 (aka Latin-1), then decoding isn't even necessary (and that's the deal with utf8-off strings in perl; they're assumed to be in THAT encoding, although some people say it just looks like it :)
The result of this thread is so depressing that I'm going to turn the work down and let them find someone else. (Shame. Could have been a nice in.)
Shame indeed, because Perl is actually very good for Unicode stuff... Unicode::Collate::Locale, for example... but yeah, Perl's strings are a source of much confusion.

In reply to Re^5: Mixed Unicode and ANSI string comparisons? by Anonymous Monk
in thread Mixed Unicode and ANSI string comparisons? by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.