in reply to character encoding question
blah blah blah <E2><80><9C> blah blah blah <E2><80><9D> blah blah blahThis looks like a utf-8 encoding of the Unicode code points 0x201c and 0x201d, which correspond to opening and closing double quotes (see General Punctuation at the Unicode web site for more details).
You can convert the 3 raw bytes into a single Unicode character as follows:
I don't know any general way converting obscure punctuation codes into simple near-equivalents.$str = "\xE2\x80\x9c"; use Encode; $d = decode_utf8($str); print "ok\n" if length($d)== 1 && $d eq "\x{201c}";
Dave.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: character encoding question
by MaskedMarauder (Acolyte) on Jun 02, 2004 at 05:15 UTC |