in reply to Re^6: Special & Accented chars in nodes titles ==> [à la française] (design)
in thread Special & Accented chars in nodes titles ==> [à la française]

Actually, it does matter in a lot more places and you are correct:

select 'é' = 'É'

returns "1". Thanks for the info.

Anyway, my idea was to update 'startform' so that all of our forms contain a hidden field something like enc="éñÇ" so we could tell if UTF-8 is coming in from a form. Your %u... stuff will catch some other cases. Checking for Content-Encoding headers coming *in* might catch more. Probably still not 100% coverage, but pretty good.

- tye        

  • Comment on Re^7: Special & Accented chars in nodes titles ==> [à la française] (detecting)
  • Download Code

Replies are listed 'Best First'.
Re^8: Special & Accented chars in nodes titles ==> [à la française] (detecting)
by theorbtwo (Prior) on Jun 30, 2004 at 08:20 UTC

    Oooh, that's a (well, several) much better ideas then I'd thought of -- I was thinking of checking if it's valid utf8, and if it is, assuming that it was, indeed, UTF8. (This is not as bad as it may appear -- in purticular, plain ole ASCII text is valid utf8, and valid latin-1, with exactly the same meaning, so it doesn't matter. Latin-1 that uses high-half characters is unlikely, from a linguistic standpoint, AFAIK, to be vaild utf8.)

    I really like the enc="éñÇ" idea, though.