Re: binmode STDOUT, ":utf8"; and umlauts

Either find is not producing iso-latin-1 — Perl assumes everything is iso-latin-1 unless you tell it otherwise — or whatever is interpreting the output of your program isn't expecting UTF-8.

Using binmode on FIND would address the first problem, and using the appropriate encoding on STDOUT would address the second.

Comment on Re: binmode STDOUT, ":utf8"; and umlauts Select or Download Code

Replies are listed 'Best First'.
Re^2: binmode STDOUT, ":utf8"; and umlauts by massa (Hermit) on Jun 22, 2008 at 20:42 UTC
if this is a typical reasonably new Linux distro, it's UTF-8 by default, i.e., you are right. He should have done `open FIND, '-\|:encoding(utf8)', 'find /dev/sda2 -name menü'` [download]	[reply] [d/l]
Re^2: binmode STDOUT, ":utf8"; and umlauts by pc88mxer (Vicar) on Jun 22, 2008 at 22:19 UTC
`find` is simply producing whatever was put there. In Linux, file names are just byte strings which may be interpreted however the user wants. Thus, it is incumbent on the user to decide on a convention for encoding file names and then to stick to that convention. This demonstrates what's going on: `#!/usr/bin/perl system("/bin/rm abc*"); system("/bin/ls"); # no files begin with "abc" my $name = "abc".chr(128); open(FOO, ">", $name); close(FOO); my $name2 = $name.chr(256); chop $name2; if ($name eq $name2) { print "\$name and \$name2 are ", ($name ne $name2 ? "not " : ""), "equal as perl strings\n"; } open(BAR, ">", $name2); close(BAR); system("/bin/ls"); # shows two files beginning with "abc"` [download] perl is evidently passing its internal representation of `$name` and `$name2` to the operating system's `open()` routine, and the OS is simply using that sequence of bytes as the file name.	[reply] [d/l] [select]