in reply to Re^2: Normalizing diacritics in (regex) search
in thread Normalizing diacritics in (regex) search

last but not least, it doesn't provide me equivalent classes for specific latin characters. Just one function unidecode to "flatten" all input to latin characters if possible.

Sorry, in that case I have misunderstood your requirements as I took it that this "flattening" is what you were after when you said "Of course I could do the normalization manually and map à á ä å ... -> a and so on." - never mind.


🦛

  • Comment on Re^3: Normalizing diacritics in (regex) search

Replies are listed 'Best First'.
Re^4: Normalizing diacritics in (regex) search
by LanX (Saint) on Nov 25, 2025 at 14:00 UTC
    No! No need to apologize, I was asking for input.

    You just asked if I tried that module and I wanted to share my insights.*

    The unidecode mapping à á ä å ... -> a would force me to normalize all search data.

    The reverse a -> à á ä å allows to fix the search term. By replacing every a with a character class [àáäå] etc.

    Both approaches have their pro and cons, I prefer to have the choice. :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

    *) reworded