Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?


Just another Perl shrine
	PerlMonks

Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?

by NERDVANA (Deacon)

on Jun 09, 2022 at 01:01 UTC ( [id://11144533]=note: print w/replies, xml )

Need Help??

in reply to How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?

If you don't know the encoding of your input, a cheap hack to "fix it" is utf8::decode($string); Call it multiple times if you think the input might be multiply utf8 encoded. Strictly speaking, this is wrong, and could damage real unicode strings that happen to look like UTF8 sequences. Practically speaking, it just "fixes things" and you can get on with the rest of your work.

The only *correct* way to decode things is to know the encoding that was given to your program, then use the Encode module. BTW, the Encode module is a core perl module, and not something you should try to avoid.

As a sidenote, I would use chr(hex $1) instead of pack("U", hex($1))