Re: binmode(':encoding(UTF-8)') did not produce utf8 for me

First, you claim you source code contains

my $Str = 'äöüÄÖÜß' . "\n";
[download]

Perl expects source encoded using UTF-8 when use utf8; is in effect.
Perl expects source encoded using ASCII when use utf8; isn't.

Seeing as you didn't use use utf8;, and seeing as those characters aren't ASCII characters, your script couldn't possibly contain those characters.

Based on the hd output, it appears that your source code is encoded using ISO-8859-1 as you say, or more likely Windows-1252 since you're on a Windows machine.

Secondly, the Win32 console is extremely unlikely to want iso-8859-1 or Windows-1252. "cp" . Win32::GetConsoleOutputCP() will produce the correct encoding.

If your script is encoded using Windows-1252 or ISO-8859-1 (as it currently is), you want this:

use v5.14;
use warnings;

use Encode qw( decode );
use Win32  qw( );

my $enc = -t STDOUT ? "cp" . Win32::GetConsoleOutputCP() : "UTF-8";

binmode( STDOUT, ":encoding($enc)" );

my $str = decode( "cp1252", "äöüÄÖÜß" );
#warn sprintf "%vX", $str;  # E4.F6.FC.C4.D6.DC.DF

say $str;
[download]

If your script is encoded using UTF-8 (the modern standard), you want this:

use v5.14;
use warnings;

use utf8;

use Win32 qw( );

my $enc = -t STDOUT ? "cp" . Win32::GetConsoleOutputCP() : "UTF-8";

binmode( STDOUT, ":encoding($enc)" );

my $str = "äöüÄÖÜß";
#warn sprintf "%vX", $str;  # E4.F6.FC.C4.D6.DC.DF

say $str;
[download]

Either way, you get this:

>chcp
Active code page: 65001   # My machine use UTF-8 for the console.

>perl a.pl
äöüÄÖÜß

>chcp 850
Active code page: 850     # It used to default to this.

>perl a.pl
äöüÄÖÜß

>chcp 437
Active code page: 437     # Common in the US.

>perl a.pl
äöüÄÖÜß

>perl a.pl >a

>perl -Mv5.14 -ne"say sprintf '%vX', $_" a
C3.A4.C3.B6.C3.BC.C3.84.C3.96.C3.9C.C3.9F.A   # UTF-8 of äöüÄÖÜß
[download]

See Re: [OT] ASCII, cmd.exe, linux console, charset, code pages, fonts and other amenities.

Comment on Re: binmode(':encoding(UTF-8)') did not produce utf8 for me Select or Download Code

Replies are listed 'Best First'.
Re^2: binmode(':encoding(UTF-8)') did not produce utf8 for me by hexcoder (Curate) on Jul 04, 2023 at 17:37 UTC
Hello ikegami, thanks very much for a fast response! I ran the first of your scripts (without 'use utf8;') in a PowerShell console with code page 850 in Strawberry Perl 5.32.1 32-bit. Without redirection I get the same output as you. With redirection however I get this file content: `000000 ff fe 1c 25 f1 00 1c 25 c2 00 1c 25 5d 25 1c 25 000010 e4 00 1c 25 fb 00 1c 25 a3 00 1c 25 92 01 0d 00 000020 0a 00` [download] This is windows 10 version 21H2 (Build 19044.3086). That made me curious, and I repeated the run in a plain CMD.EXE shell. And surprise! I then get the same output as you did. So something in PowerShell disturbs the output during redirection, it seems. That nasty behavior took me by surprise. It looks like that is where I get the unwanted extra UTF-16 LE BOM conversion from... Another thing I noticed with PowerShell was that `perl "-Mv5.14" -ne"say sprintf '%vX', $_" a` did only produce two empty lines. Thanks very much again for helping me out!	[reply] [d/l] [select]
Re^3: binmode(':encoding(UTF-8)') did not produce utf8 for me by ikegami (Patriarch) on Jul 05, 2023 at 03:29 UTC
That has nothing to do with Perl. That's PowerShell doing that. You can control the output as follows: `script \| Out-File -Encoding UTF-8 file.out` [download] The above outputs a BOM. The following doesn't, but requires PS6+ `script \| Out-File -Encoding UTF8NoBOM file.out` [download] That said, `-Encoding UTF8NoBOM` is the default with PS6+, so all you need is the following if you're using PS6+: `script >file.out` [download] So the issue is really that you are using an outdated PowerShell. References: Changing PowerShell's default output encoding to UTF-8. Using PowerShell to write a file in UTF-8 without the BOM.	[reply] [d/l] [select]
Re^2: binmode(':encoding(UTF-8)') did not produce utf8 for me by Anonymous Monk on Jul 06, 2023 at 19:43 UTC
> Perl expects source encoded using UTF-8 when `use utf8;` is in effect. > Perl expects source encoded using ASCII when `use utf8;` isn't. I don't think that's a very helpful way of looking at it. I'd say "Perl upgrades any literal strings in (utf-8) source code to character semantics when `use utf8;` is in effect" would be more helpful. In the end, it's about whether the strings you are working with are using byte semantics or character semantics. Because `binmode ":encoding()"` only works with strings with character semantics and does nothing with byte semantics. It takes some work to get used to this byte/character semantics distinction.	[reply]
Re^3: binmode(':encoding(UTF-8)') did not produce utf8 for me by ikegami (Patriarch) on Jul 09, 2023 at 10:57 UTC
That wouldn't be more useful, since that's not what it does. Perl decodes from UTF-8 with, and it decodes from ASCII (with 8-bit clean literals) without. And it does that for the entire source code, not just literals. And the literals don't necessarily use the upgraded format, even with `use utf-8`. Your explanation is simply completely wrong. In the end, it's about whether the strings you are working with are using byte semantics or character semantics. No. It very much isn't. It affects the encoding used to decode the entire code, not the internal storage format of literals. `$ perl -Mv5.14 -e'use utf8; sub fée { }' $ perl -Mv5.14 -e'no utf8; sub fée { }' Illegal declaration of subroutine main::f at -e line 1.` [download] Because binmode ":encoding()" only works with strings with character semantics and does nothing with byte semantics. That's not true either. It works for both. `$ perl -Mv5.14 -e' binmode STDOUT, ":encoding(UTF-8)"; $_ = "\xE9"; utf8::upgrade($_); say; ' \| od -t x1 0000000 c3 a9 0a 0000003 $ perl -Mv5.14 -e' binmode STDOUT, ":encoding(UTF-8)"; $_ = "\xE9"; utf8::downgrade($_); say; ' \| od -t x1 0000000 c3 a9 0a 0000003` [download] "Byte semantics" and "Unicode semantics" are (confusing and misleading) terms used to describe code suffering from The Unicode Bug. `:encoding` does not suffer from The Unicode Bug. `:encoding` is not even being discussed!	[reply] [d/l] [select]