Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8? (updated)

by haukex (Archbishop)
on Jun 09, 2022 at 12:45 UTC ( [id://11144552]=note: print w/replies, xml ) Need Help??


in reply to Re: How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?
in thread How to Use Pack to Convert UTF-16 Surrogate Pairs to UTF-8?

I still think it's likely the OP is taking the wrong approach by trying to hand-roll a JSON decoder, so I did want to point out a few issues with your code - hopefully thereby also pointing out some of the pitfalls of hand-rolled approaches.

  • High surrogates range from U+D800 to U+DBFF, which your regex doesn't cover (e.g. unpack("H*", encode("UTF-16BE", "\N{VARIATION SELECTOR-256}")) eq "db40ddef").
  • Your regexes should probably also handle uppercase hex digits.
  • You might want to pass Encode::FB_CROAK to decode.
  • You don't need to loop over the strings with a regex and then a second regex, that's fairly inefficient; it can all be done in one regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11144552]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-24 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found