lagle has asked for the wisdom of the Perl Monks concerning the following question:

I solved the question, thank you all very much! The solution was

# i added use Encoding qw(encode decode); # and in sub foo i decoded the things from File::Find. eg sub foo { $file_name = decode("UTF-8", $_); } # cheers!

here is the question and old code

# basically i want to go through a directory with File::Find # and print the (non-asciian) filenames to a file. # let's go through it incrementally, basic assumptions use utf; use File::Find; open (OUT, '>:encoding(UTF-8)', '/tmp/tmp.sql'); open (IN, '<:encoding(ISO-8859-15)',$file_name); sub foo { print "$_\n"; print OUT "$_\n"; # the IN file is used, but none of it's contents are # written to the out file, only the filename $_ is # needed in the OUT file } # when i now execute find(\&foo, $mydir); # it will print fine to the terminal (i use GNU/Debian) # but mojibake to the file # if i prepend this binmode( STDOUT, ':encoding(UTF-8)' ) or die $!; binmode( STDIN, ':encoding(UTF-8)' ) or die $!; # to the script # then it prints mojibake both to terminal and to file # it would be more logical if the other way around

what is it that i miss to encode/decode?

I haven't had problems with utf files before, but this is the first time i use File::Find for interaction with the file system

Replies are listed 'Best First'.
Re: File::Find and UTF-8 problems
by moritz (Cardinal) on Sep 26, 2010 at 17:59 UTC

    I guess you need to decode $_ in sub foo, because file names are binary data on unixish systems.

    And I'd try use utf8; instead of use utf;. Positing runnable code is always a good idea.

    Perl 6 - links to (nearly) everything that is Perl 6.

      How do i decode it and into what? I understood it has been encoded into utf8 (or UTF-8) and now i should decode it into X, so that it can again be encoded into utf when getting printed.

      Did i understood it correctly, and then, what is X in the previous statement?

        Decoding always happens into perls internal string format.

        Please read this article, it tries to explain the encoding/decoding process, and what effects it has.

        Perl 6 - links to (nearly) everything that is Perl 6.
Re: File::Find and UTF-8 problems
by ikegami (Patriarch) on Sep 26, 2010 at 18:01 UTC
    Perl treats file names as opaque strings of bytes. readdir doesn't decode the names it returns, and File::Find doesn't decode the names returned by readdir. That means you're encoding a string that's already encoded. Either decode the file name from find before printing it, or don't encode the file name on print.