in reply to binmode STDOUT, ":utf8"; and umlauts

Either find is not producing iso-latin-1 — Perl assumes everything is iso-latin-1 unless you tell it otherwise — or whatever is interpreting the output of your program isn't expecting UTF-8.

Using binmode on FIND would address the first problem, and using the appropriate encoding on STDOUT would address the second.

Replies are listed 'Best First'.
Re^2: binmode STDOUT, ":utf8"; and umlauts
by massa (Hermit) on Jun 22, 2008 at 20:42 UTC
    if this is a typical reasonably new Linux distro, it's UTF-8 by default, i.e., you are right. He should have done
    open FIND, '-|:encoding(utf8)', 'find /dev/sda2 -name menü'
Re^2: binmode STDOUT, ":utf8"; and umlauts
by pc88mxer (Vicar) on Jun 22, 2008 at 22:19 UTC
    find is simply producing whatever was put there. In Linux, file names are just byte strings which may be interpreted however the user wants. Thus, it is incumbent on the user to decide on a convention for encoding file names and then to stick to that convention.

    This demonstrates what's going on:

    #!/usr/bin/perl system("/bin/rm abc*"); system("/bin/ls"); # no files begin with "abc" my $name = "abc".chr(128); open(FOO, ">", $name); close(FOO); my $name2 = $name.chr(256); chop $name2; if ($name eq $name2) { print "\$name and \$name2 are ", ($name ne $name2 ? "not " : ""), "equal as perl strings\n"; } open(BAR, ">", $name2); close(BAR); system("/bin/ls"); # shows two files beginning with "abc"
    perl is evidently passing its internal representation of $name and $name2 to the operating system's open() routine, and the OS is simply using that sequence of bytes as the file name.