Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: getting rid of UTF-8

by ikegami (Patriarch)
on Nov 25, 2022 at 14:09 UTC ( [id://11148383]=note: print w/replies, xml ) Need Help??


in reply to getting rid of UTF-8

$text =~ s/^\xef\xbb\xbf//g does remove the offending bytes if your string contains the characters you described. (Well, it'll delete the leading sequence, and removing the ^ will have it delete the others too.)

Since you repeatedly claim it doesn't, your data is different than you describe, and we can't help you until you provide a better description of your data (e.g. the output of sprintf "%vX", $string).

fyi, EF BB BF is the UTF-8 encoding of U+FEFF, which is the Byte Order Mark if at the start of the file, and the Zero Width No-Break Space elsewhere.

Replies are listed 'Best First'.
Re^2: getting rid of UTF-8
by Anonymous Monk on Nov 25, 2022 at 19:54 UTC
    the Zero Width No-Break Space elsewhere.

    deprecated, use U+2060 instead

      Good to know.

      I don't think it was used as a word joiner. I think the presence of U+FEFF is explained by the concatenation of a BOM-prefixed string to another (the very kind of error that lead to U+2060 WORD JOINER being the new ZWNBSP).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148383]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-03-28 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found