I was hesitant to post my solution because I'm thinking someone will come along with a POSIX module solution that normalizes diacritic symbols to their base character automatically.

The same thought crossed my mind, in terms of using unicode character classes. I checked the perlfaqs in 5.8.1, and the perlunicode man page, and didn't find anything relevant, though I think there was a discussion about this sort of thing (removing accents) on the perl-unicode mail list within the last couple weeks.

In any case, I would hesitate to look for that sort of solution -- there's a reasonably good chance that operations using unicode character classes will end up being slower than just doing plain old "tr///" on plain old latin1 bytes. That, combined with the fact that AM would need to convert everything to utf8 first, tends to make this somewhat unlikely to succeed as an "optimization".

As for a POSIX (as opposed to unicode) module, I would guess that if someone decided to do this in "pure perl", it would end up as just a "tr///" statement...


In reply to Re: Re: Diacritic-Insensitive and Case-Insensitve Sorting by graff
in thread Diacritic-Insensitive and Case-Insensitve Sorting by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.