in reply to Re: Unicode (ä, ö, ü in German) Problem with File::Find under Windows2000
in thread Unicode (ä, ö, ü in German) Problem with File::Find under Windows2000

Addendum: I found out how to get UTF8 results from readdir, if you need them (for German, you don't). You use the "perl -C" flag or set ${^WIDE_SYSTEM_CALLS}.

Right now it's slightly broken because it returns utf8 strings, but doesn't set the utf8 flag on the strings. There is a workaround:

#!perl -w # use File::Find; use strict; use Encode qw(decode_utf8 is_utf8); my $start = "/home/Hirschk/pmonks/utftest"; { local ${^WIDE_SYSTEM_CALLS} = 1; finddepth( \&showme, $start ); } sub fixutf8 { for (@_) { if (${^WIDE_SYSTEM_CALLS} && !is_utf8($_)) { $_ = decode_utf8($_); } } } sub showme { fixutf8($File::Find::dir,$File::Find::name,$_); print "\$_ = $_\n"; }
The fixutf8 function should, well, fix it.

Replies are listed 'Best First'.
Summary and...
by TeddyC (Novice) on Sep 04, 2003 at 14:49 UTC

    Thank you BrowserUk! Thank you Thelonius!
    I think I got more than I hoped.

    because of some other trouble, i can try a little more since this morning.
    Thelonius,I 've tried your code and I think something will happen in File::Find::finddepth before you fix it.
    I used XML to get UTF8 string (similar to my old program).
    config.xml

    <?xml version="1.0" encoding="UTF-8" ?> <config> <srcdir>d:\temp\source\test2</srcdir> <dstdir>d:\temp\source\test5</dstdir> </config>

    newcopy6.pl
    #!d:\perl\bin\perl.exe -w use File::Find; use strict; use Encode qw(encode_utf8 decode_utf8 is_utf8); use XML::Simple; my $configfile=".\\config.xml"; my $config=XMLin($configfile); my $srcdir="d:\\temp\\source\\test2"; print "\$srcdir: $srcdir\n"; if(is_utf8($srcdir)){ print "is utf8\n"; }else{ print "is NOT utf8\n"; $srcdir=decode_utf8($srcdir); # ??? } # line "!!!" get srcdir from xml # or you can comment it to test # wether line "???" take any effect or not $srcdir=$$config{'srcdir'}; # !!! if(is_utf8($srcdir)){ print "is utf8\n"; }else{ print "is NOT utf8\n"; } { local ${^WIDE_SYSTEM_CALLS} = 1; finddepth( \&showme, $srcdir ); } sub fixutf8 { for (@_) { if (${^WIDE_SYSTEM_CALLS} && !is_utf8($_)) { $_ = decode_utf8($_); } } } sub showme { print "\$_ = $_\n"; fixutf8($File::Find::dir,$File::Find::name,$_); print "\$_ = $_\n"; }
    And I got results in Dos but It's NOT depth first!
    D:\temp\source>newcopy6.pl $srcdir: d:\temp\source\test2 is NOT utf8 is utf8 Can't cd to (d:\temp\source\test2/) &#9500;â&#9516;&#9570;a: No such f +ile or directory at D:\temp\source\newcopy6.pl line 28 $_ = &#9500;&#9570;a $_ = &#9500;&#9570;a $_ = . $_ = .
    and in Komodo
    $srcdir: d:\temp\source\test2 is NOT utf8 is utf8 $_ = öa $_ = öa $_ = . $_ = .
    and if I comment "!!!" , i got in Komodo
    Line "???" takes NO effect, but it's depth first
    $srcdir: d:\temp\source\test2 is NOT utf8 is NOT utf8 $_ = ü.txt $_ = &#52212;xt $_ = öa $_ = &#30797; $_ = . $_ = .
    (I've set UTF8 as editor encoding in Komodo's Preference, some character can't be posted here correctly, see Note from BrowserUK)
    Then I've tested the fixutf8.
    sub fixutf8 { for (@_) { print "\$_=$_"; if (${^WIDE_SYSTEM_CALLS} && !is_utf8($_)) { $_ = decode_utf8($_); } if(is_utf8($_)){ print "#\$_=$_ is utf8\n"; }else{ print "#\$_=$_ is NOT utf8\n"; } } }
    then get
    $srcdir: d:\temp\source\test2 is NOT utf8 is NOT utf8 $_ = ü.txt $_=d:\temp\source\test2/öa#$_=d:\temp\source\test2/&#30816;is utf8 $_=d:\temp\source\test2/öa/ü.txt#$_=d:\temp\source\test2/&#30831;&#522 +12;xt is utf8 $_=ü.txt#$_=&#52212;xt is utf8 $_ = &#52212;xt $_ = öa $_=d:\temp\source\test2#$_=d:\temp\source\test2 is NOT utf8 $_=d:\temp\source\test2/öa#$_=d:\temp\source\test2/&#30816;is utf8 $_=öa#$_=&#30816;is utf8 $_ = &#30797; $_ = . $_=d:\temp\source\test2#$_=d:\temp\source\test2 is NOT utf8 $_=d:\temp\source\test2#$_=d:\temp\source\test2 is NOT utf8 $_=.#$_=. is NOT utf8 $_ = .

    So, I guess,
    If I give the "finddepth" a UTF8 dirname,then it get a Ascii name of child node but can't handle them correctly like the first 2 results in Dos /komodo

    If I give the "finddepth" a normal string with the program format, it has no problem to handle them just like last result.

    finally I use the plain text als config file...
    somehow disapointed.
    But I still can't understand,
    --Why the line "???" takes no effect?
    --According to the Thelonius' Post , there is no function like getEncoding but what is the encoding in the Program?

    btw. if you visit www.perl-community.de(where i also posted), you can see some other German-in-Win32 problems, for German in Dos there is a solution from Crian