Dear Monks,

when decoding some binary log files on windows, I wanted to

So, I thought how hard can that be?

From the documentation I understood binmode() to be compatible with open() regarding I/O-layers, but my results are strangely different.

I wrote this small test program to output some non-ASCII characters (german umlauts). It checks if STDOUT is redirected, and should adjust the encoding accordingly.
use strict; use warnings; use Win32::Console::Ansi; # converts output for code page 850 to OEM c +ode page my $isRedirected = ! -t STDOUT; my $Str = 'äöüÄÖÜß' . "\n"; # this string has default encoding iso-885 +9-1 # if STDOUT is redirected to a file, use unicode encoding, otherwise u +se default if ($isRedirected) { if (!defined binmode STDOUT, ':encoding(UTF-8)') { warn "binmode failed: $!\n"; } utf8::encode($Str); } # output string with a hexdump if (!print 'string: ', hd($Str), $Str){ warn "print failed\n"; } # With a redirected file I end up with a UTF-16 LE BOM encoded file wi +th wrong content :-( # for comparison, this works as expected (produces utf8 content) open(my $fh, '>:encoding(UTF-8)', 'utf8_2') or die "can't open file fo +r writing:$!\n"; print $fh $Str or warn "print to file failed\n"; close $fh or warn "close file failed\n"; # hexdump sub hd { my $input = shift; return join(' ', unpack('(H2)*', $input)), "\n"; }
The written file 'utf_2' contains the expected output

äöüÄÖÜß

which is

000000 c3 a4 c3 b6 c3 bc c3 84 c3 96 c3 9c c3 9f 0d 0a

while the redirected STDOUT output looks very different.

string: e4 f6 fc c4 d6 dc df 0a ├ñ├Â├╝├ä├û├£├ƒ
The second line has this hexdump:

000000 1c 25 f1 00 1c 25 c2 00 1c 25 5d 25 1c 25 e4 00 000010 1c 25 fb 00 1c 25 a3 00 1c 25 92 01 0d 00 0a 00
What happened here? How did I end up with a UTF-16 LE BOM version? What would you suggest to obtain UTF8 encoding for the redirected file?

Thanks for any enlightenment!

In reply to binmode(':encoding(UTF-8)') did not produce utf8 for me by hexcoder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.