Re^4: Normalizing diacritics in (regex) search

Thanks, very similar to jo37's solution (and not really using Unicode::Collate like you suggested° ;-)

But jo37's approach with NFD is IMHO better because of the "dangers of pathological characters" I mentioned...

Consider U+3374 ㍴: NFKD will decompose it to "bar", NFD won't. That means a symbol/character "㍴" might match in "Barbra Streisand". So if eliminating diacritics is the goal, NFD is preferable.

°) For completeness: There is an example in Unicode::Collate, demonstrating normalized search with a (broken°) German phrase

e.g. when the content of $str is "Ich muß studieren Perl.", you say the following where $sub is "MüSS",
my $Collator = Unicode::Collate->new( normalization => undef, level => + 1 ); # (normalization => undef) is REQUI +RED. my $match; if (my($pos,$len) = $Collator->index($str, $sub)) { $match = substr($str, $pos, $len); }
[download]
and get "muß" in $match, since "muß" is primary equal to "MüSS".

Alas I didn't "study" this module sufficiently to tell if this is exactly matching my requirements to only ignore diacritics.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

°) Ha :) ... you can almost hear an English accent with this word order, OTOH I suppose it's easier to decipher for English speakers than "Ich muss Perl studieren". (Which is still slightly off, "lernen" would be better in this case)

Comment on Re^4: Normalizing diacritics in (regex) search Download Code