in reply to Unicode to HTML code &#....;

I'm guessing that the player who wanted to use this name ran afoul of a length limit, which was apparently imposed by byte count rather than character count -- e.g. maximum name length was 31 bytes, and this just happened to fall in the middle of a two-byte utf8 character, causing the last byte to be uninterpretable as utf8. (Whoever is responsible for imposing the length limit should revisit the issue.)

I think your method (in your later reply) of using a space to replace each "\x{FFFD}" (the unicode replacement character, which is inserted whenever there is an "uninterpretable" byte sequence) is as good as any, though maybe the "ellipsis" character ("\x{2026}" or "\x{22ef}") would be more appropriate.

IMHO, anyone who goes to the trouble of creating a "name" that contains both Latin-based (left-to-right) and Arabic-based (right-to-left) characters in a single word token is most likely trying to make trouble, and should expect (presumably wants) to see things go wrong.

Replies are listed 'Best First'.
Re^2: Unicode to HTML code &#....;
by Forlix (Novice) on Nov 16, 2008 at 01:49 UTC
    You're right, it is a limit, presumably for the names being stored in a 32 byte string with null-termination.

    But you can't possibly suggest those people to be looking for trouble, as most are merely kids trying to appear "cool" with a fancy name, and many don't even know what Unicode or UTF-8 is. They simply gather some nice looking characters from a character map and assemble a name as if they were playing with LEGO bricks.

    For anyone who wants to see the script in action, you can find it on http://forlix.org/ (the table on the bottom right). I have also added some whitespace treatments, so multiple spaces wont be collapsed (the CSS solution white-space:pre isn't yet supported well enough)