in reply to Re: Normalizing diacritics in (regex) search
in thread Normalizing diacritics in (regex) search

As Corion said, it does a lot more. Probably too much for my use case.

And it's implemented by having many translation tables which are (manually?) maintained by the author. The last version is from 2016.

And I'd rather use unicode properties directly to always stay up to date.

last but not least, it doesn't provide me equivalent classes for specific latin characters. Just one function unidecode to "flatten" all input to latin characters if possible.

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^3: Normalizing diacritics in (regex) search
by hippo (Archbishop) on Nov 25, 2025 at 10:41 UTC
    last but not least, it doesn't provide me equivalent classes for specific latin characters. Just one function unidecode to "flatten" all input to latin characters if possible.

    Sorry, in that case I have misunderstood your requirements as I took it that this "flattening" is what you were after when you said "Of course I could do the normalization manually and map à á ä å ... -> a and so on." - never mind.


    🦛

      No! No need to apologize, I was asking for input.

      You just asked if I tried that module and I wanted to share my insights.*

      The unidecode mapping à á ä å ... -> a would force me to normalize all search data.

      The reverse a -> à á ä å allows to fix the search term. By replacing every a with a character class [àáäå] etc.

      Both approaches have their pro and cons, I prefer to have the choice. :)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      *) reworded