in reply to Re^2: Reading in utf-8 txt file gives garbled data when printed as part of utf-8 html...
in thread Reading in utf-8 txt file gives garbled data when printed as part of utf-8 html...

f\303\266\303\266\n is UTF-8 encoded.
If it's a string of chars (the UTF-8 flag is set), you'll get UTF-8 when you print to a UTF-8 filehandle.
If it's a string of octets (the UTF-8 flag is clear), you'll get UTF-8 when you print to a raw filehandle.

f\x{f6}\x{f6}\n is iso-latin-1 encoded.
When you print to a UTF-8 filehandle, Perl will assume it's iso-latin-1 and convert it to UTF-8.
When you print to a raw filehandle, you'll get those exact octets.

  • Comment on Re^3: Reading in utf-8 txt file gives garbled data when printed as part of utf-8 html...
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: Reading in utf-8 txt file gives garbled data when printed as part of utf-8 html...
by isync (Hermit) on Aug 28, 2007 at 09:54 UTC
    That made everything a lot clearer and the $Data::Dumper::Useqq switch is EXTREMELY helpful! Thanks!
      you can also use use utf8; you dont have to make it binmode as all strings , input and output will be considered as in perls lax utf8 interpretation.
        use utf8; doesn't remove the need to binmode the handles.
        use utf8; print($fh chr(0x40)); # Happens to work print($fh chr(0xC9)); # Generates broken output print($fh chr(0x2660)); # Warns binmode($fh, ':utf8'); print($fh chr(0x40)); # Ok print($fh chr(0xC9)); # Ok print($fh chr(0x2660)); # Ok

        All it does is let Perl know the source is encoded using UTF-8.