in reply to Character class for French chars with accents in regex?

If what you actually want to check is that it is a word character, rather than being specifically in the French alphabet, then you can convert your text to UTF8 and \w to match.

From perlunicode:

Character classes in regular expressions match characters instead of bytes and match against the character properties specified in the Unicode properties database. \w can be used to match a Japanese ideograph, for instance.

(However, and as a limitation of the current implementation, using \w or \W inside a ... character class will still match with byte semantics.)

This helps when a user's name contains (eg) Ñ - it is still allowed even though it is not French.

Clint

Replies are listed 'Best First'.
Re^2: Character class for French chars with accents in regex?
by ikegami (Patriarch) on Aug 09, 2007 at 18:59 UTC

    There are at least two downsides to that method worth mentioning.

    First, it allows similar looking characters to be used. For example, there's a cyrillic letter that looks almost identical to the latin 'a'. If the regexp is used to limit valid user names, it wouldn't stop one user from impersonating another by creating a similar looking user name.

    Secondly, it may allow characters that users have no easy way of entering into forms and characters that some/many users are unable to render.

    The severity of these downsides depends on the purpose of the regexp.

    Update: Here are some similar looking strings, but each is different:

    • French Braid
    • Frenсh Braid
    • French Вraid
    • French Brаid
    • French Braіd
      Fair points, both, and well mentioned. Depending on the application for this filter, these downsides may count for less than making your customers irate because they can't enter their names.

      Clint

        <grumble>Apparently it doesn't matter if customers' names are screwed up. My beautiful code here at work which will cope with big-endian, little-endian, even middle-endian names and people with only one name - it just got vetoed by $boss in favour of first name and surname. Thankfully it won't be me who has to explain to Chow Yun Fat why the software calls him Mr. Yun.</grumble>