in reply to Strange unicode behaviour with UTF8 encoding and format

I can't explain the behaviour you are observing, but I want to comment on a few things nonetheless:

I usually discourage people from using encoding, because it has many known problems - one of which is related to formats. Also to quote the documentation:

At any rate, the very use of format is questionable when it comes to unicode characters since you have to consider such things as character width (i.e. double-width for ideographs) and directions (i.e. BIDI for Arabic and Hebrew).

Instead I write my scripts in UTF-8, and use the utf8 and open pragmas.

  • Comment on Re: Strange unicode behaviour with UTF8 encoding and format

Replies are listed 'Best First'.
Re^2: Strange unicode behaviour with UTF8 encoding and format
by mje (Curate) on Feb 18, 2010 at 11:12 UTC

    Thanks, the character width issue in format is rather obvious but I'd not noticed the problematic example documented. Personally I've never used encoding and so far I've not had a need to use utf8 in my scripts although I am mostly dealing with unicode data. I was just a little mystified as to what was going on here.