Re^5: Strange behaviour ODBC/Unicode in perl

Printing shouldn't cause any conversion

$ perl -MDevel::Peek -MEncode -we'$x=encode("UTF-8", chr(259)); Dump($
+x); print $x' | od -b
SV = PV(0x819f9e0) at 0x814cc6c
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x81698e0 "\304\203"\0   ---.
  CUR = 2                           \__ same
  LEN = 3                           /
0000000 304 203      --------------'
0000002

$ perl -MDevel::Peek -MEncode -we'$x=encode("UTF-8", chr(238)); Dump($
+x); print $x' | od -b
SV = PV(0x819f9e0) at 0x814cc6c
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x81698e0 "\303\256"\0   ---.
  CUR = 2                           \__ same
  LEN = 3                           /
0000000 303 256      --------------'
0000002
[download]

How do you know it's outputting xEE?

Are you using :encoding() on the STDOUT? You shouldn't with this data.

Are you using CGI's HTML generation methods (print h1(text))? They do some encoding too.

Comment on Re^5: Strange behaviour ODBC/Unicode in perl Select or Download Code

Replies are listed 'Best First'.
Re^6: Strange behaviour ODBC/Unicode in perl by jpvdv (Initiate) on Feb 05, 2008 at 08:11 UTC
Thanks so very much for your time and patience. I get GOOD output now; what did the trick is setting `binmode(STDOUT, ':encoding(utf8)');` [download] That stopped the strange behaviour. I think the "ов" being displayed was rather the exception then the right thing. And it was the virgility of the Paletino Font to interpret the Wide Character, that was in the HTML-text, and not its restriction on not print о. On close watch I had gotten a Wide Character-warning as well.... When I look at the source of the page as the browser received it I see `оă о` [download] rather than `Г®Дѓ о` [download] The "Г®Дѓ" in the latter case `looked` right, but realy wasn't.... Thanks again for helping me sort this out .	[reply] [d/l] [select]
Re^7: Strange behaviour ODBC/Unicode in perl by ikegami (Patriarch) on Feb 05, 2008 at 08:44 UTC
On close watch I had gotten a Wide Character-warning as well.... That doesn't jive with what you said earlier. To get a wide character warning, you need to have a wide character, yet you said the output you got from Dump didn't have `[UTF8 "..."]`, so no wide characters. `binmode(STDOUT, ':encoding(utf8)');` `binmode(STDOUT, ':encoding(utf8)');` is a speed hack for `binmode(STDOUT, ':encoding(utf-8)');` The former skips some checks, but doing so opens up a security vulnerability. Don't use the former on untrusted text. In fact, don't use the former. (I mistakenly used utf8 in my earlier post, sorry) When I look at the source of the page as the browser received it I see I wouldn't use view source for this at all. Look at the actual bytes of the source. You should see two bytes for each of those chars if the page uses the UTF-8 charset.	[reply] [d/l] [select]