in reply to Re: Diacritic-Insensitive and Case-Insensitve Sorting
in thread Diacritic-Insensitive and Case-Insensitve Sorting

I was hesitant to post my solution because I'm thinking someone will come along with a POSIX module solution that normalizes diacritic symbols to their base character automatically.

The same thought crossed my mind, in terms of using unicode character classes. I checked the perlfaqs in 5.8.1, and the perlunicode man page, and didn't find anything relevant, though I think there was a discussion about this sort of thing (removing accents) on the perl-unicode mail list within the last couple weeks.

In any case, I would hesitate to look for that sort of solution -- there's a reasonably good chance that operations using unicode character classes will end up being slower than just doing plain old "tr///" on plain old latin1 bytes. That, combined with the fact that AM would need to convert everything to utf8 first, tends to make this somewhat unlikely to succeed as an "optimization".

As for a POSIX (as opposed to unicode) module, I would guess that if someone decided to do this in "pure perl", it would end up as just a "tr///" statement...

  • Comment on Re: Re: Diacritic-Insensitive and Case-Insensitve Sorting