Re: opening accented file names
by zentara (Cardinal) on Nov 11, 2010 at 16:42 UTC
|
I saw that before. Use decode, to properly ensure that the filenames are encoded right.
#this decode utf8 routine is used so filenames with extended
# ascii characters (unicode) in filenames, will work properly
use Encode;
opendir my $dh, $path or warn "Error: $!";
my @files = grep !/^\.\.?$/, readdir $dh;
closedir $dh;
# @files = map{ "$path/".$_ } sort @files;
#$_ = decode( 'utf8', $_ ) for ( @files );
@files = map { decode( 'utf8', "$path/".$_ ) } sort @files;
| [reply] [d/l] |
|
| [reply] |
Re: opening accented file names
by Jim (Curate) on Nov 11, 2010 at 21:39 UTC
|
D:\>chcp
Active code page: 1252
D:\>dir acentó.dat /b
acentó.dat
D:\>type test.pl
#!perl
use strict;
use warnings;
my $fn = 'acentó.dat';
if (open my $fh, '<', $fn) {
print "Successfully opened file '$fn'\n";
close $fh;
}
else {
print "Error opening file '$fn': $!\n";
}
D:\>perl test.pl
Successfully opened file 'acentó.dat'
D:\>perl -ne "print $1 if m/(acentó)/" test.pl
acentó
D:\>perl -ne "print $1 if m/(acentó)/" test.pl | od -h
0000000000 61 63 65 6E 74 F3
0000000006
D:\>perl -v | fmt -w 53
This is perl 5, version 12, subversion 2 (v5.12.2)
built for MSWin32-x86-multi-thread (with 8 registered
patches, see perl -V for more detail)
Copyright 1987-2010, Larry Wall
Binary build 1202 [293621] provided by ActiveState
http://www.ActiveState.com Built Sep 6 2010 23:36:03
Perl may be copied only under the terms of either the
Artistic License or the GNU General Public License,
which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists,
should be found on this system using "man perl" or
"perldoc perl". If you have access to the Internet,
point your browser at http://www.perl.org/, the Perl
Home Page.
D:\>
There's no use utf8 in the script because there's no Unicode text here. The text is in the Windows-1252 character encoding, also called CP1252, ANSI and Western European. The ó character is \xF3.
In general, you can't use Perl on Microsoft Windows to handle Unicode folder and file names. See Is File::Find Unicode-(Conformant|Compliant|Enabled|Capable)?, dos path accents and renaming subdirectories on MS Windows for more details. | [reply] [d/l] [select] |
|
c:\test>dir ac*/b
File Not Found
c:\test>chcp 850 & echo this is acentó.dat >acentó.dat
& perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a
+c*
Active code page: 850
this is acentó.dat
c:\test>chcp 437 & echo this is acentó.dat >acentó.dat
& perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del ac*
Active code page: 437
this is acentó.dat
c:\test>chcp 1250 & echo this is acentó.dat >acentó.dat
& perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a
+c*
Active code page: 1250
this is acent˘.dat
c:\test>chcp 1252 & echo this is acentó.dat >acentó.dat
& perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a
+c*
Active code page: 1252
this is acentó.dat
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
I included the CHCP command in my post because the active code page matters. It governs what is displayed and, in the case of code page 65001, even how commands work (or don't work).
Here's as faithful as possible on PerlMonks a representation of exactly what I see on my screen as I type these commands at the Windows Command Prompt:
D:\>chcp 1252
Active code page: 1252
D:\>dir acentó.dat /b
acentó.dat
D:\>perl test.pl acentó.dat
Successfully opened file 'acentó.dat'
D:\>chcp 437
Active code page: 437
D:\>perl test.pl acentó.dat
Successfully opened file 'acent≤.dat'D:\>chcp 850
Active code page: 850
D:\>perl test.pl acentó.dat
Successfully opened file 'acent¾.dat'
D:\>chcp 65001
Active code page: 65001
D:\>perl test.pl acentó.dat
Successfully opened file 'acent.dat'
D:\>type test.pl
#!perl
use strict;
use warnings;
my $fn = shift @ARGV;
if (open my $fh, '<', $fn) {
print "Successfully opened file '$fn'\n";
close $fh;
}
else {
print "Error opening file '$fn': $!\n";
}
D:\>
(Sorry, I can't follow your session log because there are too many things going on in it at once. It's too dense and complicated.)
| [reply] [d/l] [select] |
|
|
|
Re: opening accented file names
by Your Mother (Archbishop) on Nov 11, 2010 at 18:27 UTC
|
If you have the wide chars in your actual code, you should probably include use utf8. Untested, but it might be enough here.
| [reply] [d/l] |
|
IIRC use utf8 worked for me.
However, it only works if the filename is hardcoded into the script. If it's typed in by the user, it gets trickier (so much so that I have no solution for that scenario).
| [reply] |
|
D:\>chcp 1252
Active code page: 1252
D:\>dir acentó.dat /b
acentó.dat
D:\>perl test.pl acentó.dat
Successfully opened file 'acentó.dat'
D:\>type test.pl
#!perl
use strict;
use warnings;
my $fn = shift;
if (open my $fh, '<', $fn) {
print "Successfully opened file '$fn'\n";
close $fh;
}
else {
print "Error opening file '$fn': $!\n";
}
D:\>
| [reply] [d/l] [select] |
|
Re: opening accented file names
by nikosv (Deacon) on Nov 15, 2010 at 12:45 UTC
|
| [reply] |
Re: opening accented file names
by shrodi (Initiate) on Jan 07, 2016 at 15:08 UTC
|
First, make sure the Perl script is saved with the right encoding (Windows-1252 (cp1252) in my case; most text editors allow you to choose the encoding, and nowadays they mostly save in UTF-8 be default; if you want to keep it in UTF-8, then "use utf8;"). Then, this works for me on Windows 7:
use Encode qw(encode decode);
$outname = encode 'cp1252', "accentué.txt";
open $outfile, $outname
or die "\nIncapable d'ouvrir '$outname' en lecture\n";
$contenu = <$outfile>;
close $outfile;
print STDOUT encode 'cp1252', "\nContenu = $contenu\n";
| [reply] [d/l] |
|
| [reply] [d/l] |