in reply to Character encoding in console in Windows

You probably don't need a general solution, in which case I was over complicating things. Take two.

If you're entering the file name by dragging and dropping, Windows should be encoding them using the console's code page, which means it's already properly encoded for use by open(). Don't do any decoding, and it should work.

If you follow the above advice and you also want to be able to enter the file name using STDIN or via @ARGV, you'll have to enter it as it shows up in the results of dir. It will be impossible to enter some exotic file names this way (without using it's short file name as shown in dir /x).

If this is unacceptable, let me know which aspects of it isn't.

If it doesn't work, run the following program, and give me the output and the name of the file as you see it in explorer ("My Computer" or the likes).

use strict; use warnings; use Data::Dumper qw( Dumper ); print("Enter file path> "); chomp( my $qfn = <STDIN> ); { local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; print('$qfn=', Dumper($qfn), "\n"); } open(my $fh, '<', $qfn) or die("open $qfn: $!\n");

Replies are listed 'Best First'.
Re^2: Character encoding in console in Windows
by elef (Friar) on Sep 15, 2010 at 15:39 UTC
    Thanks for the more dumb-proof explanation, although I'm afraid it's still not dumb-proof enough for me.

    If you're entering the file name by dragging and dropping, Windows should be encoding them using the console's code page, which means it's already properly encoded for use by open(). Don't do any decoding, and it should work.
    Drag&drop would indeed be sufficient for me, although I'm not sure if there's any difference between typing a path in the console and having it pasted in with drag & drop. Either way, I certainly don't need @ARGV filename input at all. You say drag and drop should work with plain old open(), but this is exactly what I originally tried and it failed, hence this tread. To make things clear, here's a simplified script showing what I do. It doesn't even parse the file name for path/filename/extension, it just strips whitespace and quotes and tests if the perl script can find the file.

    #!/usr/bin/perl use strict; use warnings; print "Drop input file in console\n"; chomp (my $file = <STDIN>); $file =~ s/ *["']?([^"']*)["']? *$/$1/; # strip whitespace and +quotes if (-e "$file") {print "\nFile found, everything is fine.\nFile:>$file +<"} else {print "\nOoops, file not found.\nFile: >$file<"} <STDIN>;

    If I drop in a file called i.txt, it finds it, so far so good. If I drop in í.txt from the same folder, the script can't find it. open (FILE, "<:encoding(UTF-8)", "$file"); fails on í.txt just the same as (-e "$file"). Are you saying this should work out of the box? Or should I use one of the modules? Win32::Unicode::File? Or Win32API::File?
    This command in Win32::Unicode::File seems to promise to do what I want: my $fh = Win32::Unicode::File->new($mode, $file_name); # create an instance and open the file
    but it's certainly not just a plain open() like you say. I haven't installed the module so I haven't tried it.

    Here's the output from your script, if I start it in the folder where í.txt is saved:
    Enter file path> í.txt $qfn="\241.txt" open í.txt: No such file or directory
      >perl -MDevel::Peek -e"chomp($_=<STDIN>); Dump($_); open($fh, '<', $_) + or die; print <$fh>" C:\Users\ikegami\í.txt SV = PV(0xeaf80) at 0x2f59c8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2f3788 "C:\\Users\\ikegami\\\241.txt"\0 CUR = 22 LEN = 80 Died at -e line 1, <STDIN> line 1.

      Is that encoded for my ANSI code page (1252) or my OEM code page (437)?

      >perl -MDevel::Peek -MEncode -e"Dump(encode('cp1252', chr(0xED))) SV = PV(0x2fa570) at 0x27b210 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x32c24f8 "\355"\0 CUR = 1 LEN = 8 >perl -MDevel::Peek -MEncode -e"Dump(encode('cp437', chr(0xED))) SV = PV(0x30a570) at 0x28b210 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x32c24f8 "\241"\0 CUR = 1 LEN = 8

      So open() expects the name to be encoded using the ANSI code page, but it's coming from STDIN in the OEM code page.

      >perl -MDevel::Peek -MEncode=from_to -e"chomp($_=<STDIN>); from_to($_, + 'cp437', 'cp1252'); Dump($_); open($fh, '<', $_) or die; print <$fh> +" C:\Users\ikegami\í.txt SV = PV(0x159adc0) at 0x315b58 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x344d258 "C:\\Users\\ikegami\\\355.txt"\0 CUR = 22 LEN = 80 ok

      In broad strokes, the OEM code page is the encoding used by console apps, ANSI for others. Now how do you get those code pages? Good question.

        Now how do you get those code pages?

        using chcp?

        c:\test>chcp && perl -E" say `chcp`; say `chcp 437`; say`chcp`" && chc +p Active code page: 850 Active code page: 850 Active code page: 437 Active code page: 437 Active code page: 437

        Alternatively, Win32::Console:

        InputCP [codepage] Gets or sets the input code page used by the console. Note that this d +oesn't apply to a console object, but to the standard input console. +This attribute is used by the Write method. See also: OutputCP. Example: $codepage = $CONSOLE->InputCP(); $CONSOLE->InputCP(437); # you may want to use the non-instanciated form to avoid confuzion + :) $codepage = Win32::Console::InputCP(); Win32::Console::InputCP(437);

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      You always catch me away from my Windows machine! I'll do some testing tonight.