in reply to Tk and Non-ASCII File Names

You should use try to
use utf8;
and see if that works.

But I had that problem before, and graph showed how to decode those pesky filenames like the following.

#this decode utf8 routine is used so filenames with extended # ascii characters (unicode) in filenames, will work properly use Encode; opendir my $dh, $path or warn "Error: $!"; my @files = grep !/^\.\.?$/, readdir $dh; closedir $dh; # @files = map{ "$path/".$_ } sort @files; #$_ = decode( 'utf8', $_ ) for ( @files ); @files = map { decode( 'utf8', "$path/".$_ ) } sort @files;
or for a single file
use Encode; my $file = decode('utf8', $file)
from his explanation, that will tell Perl to see it as utf8, even if the filesystem didn't store it as such.

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Replies are listed 'Best First'.
Re^2: Tk and Non-ASCII File Names
by eff_i_g (Curate) on Sep 28, 2010 at 15:43 UTC
    zentara,

    use utf8; works with the code given; however, when I incorporate it into the larger program it does not work. I'm mimicking that in the posted script by replacing

    my $file = '06_Protection_de_la_tête.xml';
    with
    use File::Find::Rule; my @files = File::Find::Rule->file->name('*.xml')->in('.'); my $file = shift @files;

    If I follow this with

    use Encode; my $file = decode('utf8', $file)
    then all of the non-Tk lines break:
    06_Protection_de_la_t�te.xml: No such file or directory
    Argh!

    Do you suspect this to be Tk-related since the other commands work fine and there is an open bug related to this matter? Tk::ExecuteCommand builds the command by appending to the command I pass:

    $self->{-command} . ' 2>&1 |'

    Could this concatenation be changing the internal encoding of the string no matter what I send to the module? As I noted in my other reply, I can get all of this working if I modify a copy of the module and use utf8::downgrade, but I don't know if it's wise to change the one in production.

      Do you suspect this to be Tk-related since the other commands work fine and there is an open bug related to this matter?

      That sounds plausible, but hopefully an unicode expert like graff will weigh in. ( Maybe private msg graff and ask him to look at it? ) I'm a provincial american, who seldoms deals with non-ascii filenames. :-)

      I would first try reading the directories and printing the list to a Tk text box, and see if there is any name changes.


      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku ................... flash japh
      I don't have any way to test in an environment that matches yours, but based on what you've posted so far, it would appear that your locale settings and non-ascii file names are "consistent" -- both involve a single-byte-per-character encoding for "vanilla" European (8859-1, i.e. Latin-1).

      So your Encode::decode call should specify that the string being passed to it needs to be decoded from that encoding:

      my $file = decode( 'iso-8859-1', $file );
      Try that and see if it helps. The return value should be a valid utf8 string with the accented "e" rendered as intended, because the value being passed in $file is a valid 8859-1 string.

      When you passed 'utf-8' as the first arg to decode(), perl was being told to expect utf8 data in $file, but the single non-ascii byte there was not parsable as utf8, and what you got in place of it was the unicode "REPLACEMENT CHARACTER" (U+FFFD), which, when rendered as utf8 data, is the three-byte sequence "0xef 0xbf 0xbd", and that sequence, when played through a Latin-1 display window, yields the three goofy characters that you got.

        graff,

        Thanks for your input. I tried decode but I'm still error-ridden; I've included the updates below. Any other ideas or encoding tricks where I can see what's going on under the hood?

        Command Line Output:
        06_Protection_de_la_tête.xml: No such file or directory No such file or directory Assuming 'require Tk::ExecuteCommand;' at ./tmp.pl line 24
        Tk Output:
        06_Protection_de_la_tête.xml: No such file or directory
        Code:
        #!/usr/local/bin/perl use warnings; use strict; use Tk; use File::Find::Rule; use Encode qw(decode); #my $file = '06_Protection_de_la_tête.xml'; my @files = File::Find::Rule->file->name('*.xml')->in('.'); my $file = shift @files; $file = decode('iso-8859-1', $file); my $cmd = "ls -l $file"; ### Try ls. print qx($cmd); ### Try reading the first line. open my $F, '<', $file; print $! ? "$!\n" : scalar <$F> ; ### Try ls via Tk. my $mw = MainWindow->new; my $exec = $mw->ExecuteCommand( -command => $cmd, )->pack; $exec->execute_command; $exec->update; MainLoop;