in reply to What Voodoo Encoding does RTF use for > ASCII Chars?

You will need to find out the encoding of your input data. The \u suggests to me that the characters are likely UTF-8-hex-encoded unicode code points. You will need to find out what encoding / codepage RTF actually uses, and encode to that target. See perluniintro, and whatever RTF spec.

Also, it would be interesting to hear from you how Unicode::Escape fails for you and where it misses the RTF specs (and where the RTF specs are to be found). I don't find the Unicode::Escape documentation talking about RTF at all, so maybe there is some finer point I'm missing.

Replies are listed 'Best First'.
Re^2: What Voodoo Encoding does RTF use for > ASCII Chars?
by tosh (Scribe) on Mar 20, 2012 at 22:16 UTC
    \u is used for Unicode documents according to Wikipedia:
    http://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding

    \' is used for Windows1256 encoded, and no mention is made of Mac-encoded even tho' Word for Mac uses it.

    Unicode::Escape fails because it seems to encode higher order characters differently than RTF editors do.

    I don't care who is right, it's just not the same and so doesn't work, and of course this presents problems when I'm filling my templates with UTF8 data and trying to filter it. :(