in reply to opening accented file names

This works for me:

D:\>chcp Active code page: 1252 D:\>dir acentó.dat /b acentó.dat D:\>type test.pl #!perl use strict; use warnings; my $fn = 'acentó.dat'; if (open my $fh, '<', $fn) { print "Successfully opened file '$fn'\n"; close $fh; } else { print "Error opening file '$fn': $!\n"; } D:\>perl test.pl Successfully opened file 'acentó.dat' D:\>perl -ne "print $1 if m/(acentó)/" test.pl acentó D:\>perl -ne "print $1 if m/(acentó)/" test.pl | od -h 0000000000 61 63 65 6E 74 F3 0000000006 D:\>perl -v | fmt -w 53 This is perl 5, version 12, subversion 2 (v5.12.2) built for MSWin32-x86-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2010, Larry Wall Binary build 1202 [293621] provided by ActiveState http://www.ActiveState.com Built Sep 6 2010 23:36:03 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. D:\>

There's no use utf8 in the script because there's no Unicode text here. The text is in the Windows-1252 character encoding, also called CP1252, ANSI and Western European. The ó character is \xF3.

In general, you can't use Perl on Microsoft Windows to handle Unicode folder and file names. See Is File::Find Unicode-(Conformant|Compliant|Enabled|Capable)?, dos path accents and renaming subdirectories on MS Windows for more details.

Replies are listed 'Best First'.
Re^2: opening accented file names
by BrowserUk (Patriarch) on Nov 11, 2010 at 22:36 UTC

    Nice++ But I think that you are confusing things by mentioning chcp. It isn't necessary, at least for this particular accented character (which I typed manually using alt-graph-o in each case shown):

    c:\test>dir ac*/b File Not Found c:\test>chcp 850 & echo this is acentó.dat >acentó.dat & perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a +c* Active code page: 850 this is acentó.dat c:\test>chcp 437 & echo this is acentó.dat >acentó.dat & perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del ac* Active code page: 437 this is acentó.dat c:\test>chcp 1250 & echo this is acentó.dat >acentó.dat & perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a +c* Active code page: 1250 this is acent&#728;.dat c:\test>chcp 1252 & echo this is acentó.dat >acentó.dat & perl -we"open I,$ARGV[0]; print while<I>" acentó.dat & del a +c* Active code page: 1252 this is acentó.dat

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I included the CHCP command in my post because the active code page matters. It governs what is displayed and, in the case of code page 65001, even how commands work (or don't work).

      Here's as faithful as possible on PerlMonks a representation of exactly what I see on my screen as I type these commands at the Windows Command Prompt:

      D:\>chcp 1252 Active code page: 1252 D:\>dir acentó.dat /b acentó.dat D:\>perl test.pl acentó.dat Successfully opened file 'acentó.dat' D:\>chcp 437 Active code page: 437 D:\>perl test.pl acentó.dat
      Successfully opened file 'acent≤.dat'
      D:\>chcp 850 Active code page: 850 D:\>perl test.pl acentó.dat Successfully opened file 'acentľ.dat' D:\>chcp 65001 Active code page: 65001 D:\>perl test.pl acentó.dat Successfully opened file 'acent.dat' D:\>type test.pl #!perl use strict; use warnings; my $fn = shift @ARGV; if (open my $fh, '<', $fn) { print "Successfully opened file '$fn'\n"; close $fh; } else { print "Error opening file '$fn': $!\n"; } D:\>

      (Sorry, I can't follow your session log because there are too many things going on in it at once. It's too dense and complicated.)

        (Sorry, I can't follow your session log because there are too many things going on in it at once. It's too dense and complicated.)

        Sorry, let me simplify it.

        This is in a completely new session as identified by the banner at the top. Everything you see in the session was typed manually, not copy&pasted. (This is important!). As I stated above, I am typing the 'ó' using AltGr-o:

        Microsoft Windows [Version 6.0.6001] Copyright (c) 2006 Microsoft Corporation. All rights reserved. c:\>dir ac*/b File Not Found c:\>chcp 65001 Active code page: 65001 c:\>echo this is the contents of acentó.dat >acentó.dat c:\>dir ac*/b acentó.dat c:\>perl -pe1 acentó.dat this is the contents of acentó.dat c:\>

        The reason typing rather than cutting and pasting is important, is because the character code that sits behind the ó glyph will vary from codepage to code page.

        Hence, note the glyph displayed when the contents of a file created under cp437 are displayed under cp65001. On screen, prior to cutting and pasting to perlmonks, that � character is displayed as the typical 'open box' glyph.

        c:\>chcp 437 Active code page: 437 c:\>echo this is the contents of acentó.dat > acentó.dat c:\>dir /b ac* acentó.dat c:\>perl -pe1 acentó.dat this is the contents of acentó.dat c:\>chcp 65001 Active code page: 65001 c:\>dir /b ac* acentó.dat c:\>perl -pe1 acentó.dat this is the contents of acent&#65533;.dat

        So, your inclusion of the codepages confuses rather than clarifies.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.