in reply to Re^4: Strange behaviour ODBC/Unicode in perl
in thread Strange behaviour ODBC/Unicode in perl

Printing shouldn't cause any conversion

$ perl -MDevel::Peek -MEncode -we'$x=encode("UTF-8", chr(259)); Dump($ +x); print $x' | od -b SV = PV(0x819f9e0) at 0x814cc6c REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x81698e0 "\304\203"\0 ---. CUR = 2 \__ same LEN = 3 / 0000000 304 203 --------------' 0000002 $ perl -MDevel::Peek -MEncode -we'$x=encode("UTF-8", chr(238)); Dump($ +x); print $x' | od -b SV = PV(0x819f9e0) at 0x814cc6c REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x81698e0 "\303\256"\0 ---. CUR = 2 \__ same LEN = 3 / 0000000 303 256 --------------' 0000002

How do you know it's outputting xEE?

Are you using :encoding() on the STDOUT? You shouldn't with this data.

Are you using CGI's HTML generation methods (print h1(text))? They do some encoding too.

Replies are listed 'Best First'.
Re^6: Strange behaviour ODBC/Unicode in perl
by jpvdv (Initiate) on Feb 05, 2008 at 08:11 UTC
    Thanks so very much for your time and patience.

    I get GOOD output now; what did the trick is setting
    binmode(STDOUT, ':encoding(utf8)');
    That stopped the strange behaviour. I think the "îâ" being displayed was rather the exception then the right thing. And it was the virgility of the Paletino Font to interpret the Wide Character, that was in the HTML-text, and not its restriction on not print î. On close watch I had gotten a Wide Character-warning as well....
    When I look at the source of the page as the browser received it I see
    îă î
    rather than
    îă î
    The "îă" in the latter case `looked` right, but realy wasn't.... Thanks again for helping me sort this out .

      On close watch I had gotten a Wide Character-warning as well....

      That doesn't jive with what you said earlier. To get a wide character warning, you need to have a wide character, yet you said the output you got from Dump didn't have [UTF8 "..."], so no wide characters.

      binmode(STDOUT, ':encoding(utf8)');

      binmode(STDOUT, ':encoding(utf8)');
      is a speed hack for
      binmode(STDOUT, ':encoding(utf-8)');
      The former skips some checks, but doing so opens up a security vulnerability. Don't use the former on untrusted text. In fact, don't use the former.
      (I mistakenly used utf8 in my earlier post, sorry)

      When I look at the source of the page as the browser received it I see

      I wouldn't use view source for this at all. Look at the actual bytes of the source. You should see two bytes for each of those chars if the page uses the UTF-8 charset.