WingedKnight has asked for the wisdom of the Perl Monks concerning the following question:
I have input strings which contain text in which some characters are in UTF-16 format and escaped with '\u'. I am trying to convert all the strings to UTF-8. For example, the string 'Alice & Bob & Carol' might be formatted in the input as:
'Alice \u0026 Bob \u0026 Carol'
To do my desired conversion, I was doing...:
$str =~ s/\\u([A-Fa-f0-9]{4})/pack("U", hex($1))/eg;
...which worked fine until I got to input strings that contained UTF-16 surrogate pairs like:
'Alice \ud83d\ude06 Bob'
How do I modify the above code that uses pack to work with UTF-16 surrogate pairs? I would really like a solution that just uses pack without having to use any additional libraries (JSON::XS, Encode, etc.).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?
by haukex (Archbishop) on Jun 09, 2022 at 08:02 UTC | |
|
Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?
by graff (Chancellor) on Jun 09, 2022 at 03:30 UTC | |
by haukex (Archbishop) on Jun 09, 2022 at 12:45 UTC | |
by ikegami (Patriarch) on Jun 10, 2022 at 04:40 UTC | |
by WingedKnight (Novice) on Jun 14, 2022 at 02:44 UTC | |
|
Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?
by NERDVANA (Priest) on Jun 09, 2022 at 01:01 UTC |