jhoule has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I try to create a simple script that create a list of files included under a source directory but, missing in a destination directory to copy those files (the missing ones) from source to destination.

The problem is that some of those files got French accentuated characters.

So à (133) is replaced by α (224).

From now, I've try many different solutions that I have found on this site and other places into the Web, without never finding a way to succeed. The nearest is :


use Encode qw(from_to);


from_to($fic, "cp1250", "utf8");


I tried with almost each encoding found in perldoc Encode::Supported (Like cp437, cp850, cp852, iso-8859-1, ..., iso-8859-16), none replaced α (224) by à (133).

I'm running activeperl 5.8.0 on Win2k and XP

Any suggestion will be appreciated.

Thanks!

  • Comment on Cannot copy files when filenames got French accentuated character

Replies are listed 'Best First'.
Re: Cannot copy files when filenames got French accentuated character (Win32API::File)
by tye (Sage) on Nov 03, 2007 at 01:41 UTC

    You can use CopyFileW() from Win32API::File. The *W() functions are not well documented in that module. The "Unicode" mentioned there is what Microsoft calls "Unicode" (UCS-16LE, I think) not Perl's style of Unicode (UTF-8).

    Follow-up if you have problems getting it to work.

    I have a partially finished new version of Win32API::File that I'm trying to get back to getting released that handles Unicode much better and much, much easier.

    - tye        

Re: Cannot copy files when filenames got French accentuated character
by tilly (Archbishop) on Nov 03, 2007 at 14:55 UTC
    Are you just trying to list them in a DOS terminal and are finding they look wrong?

    If so, then I'd suggest writing them to a file, then opening up that file in a standard Windows program like Notepad or Wordpad and seeing if they look right there. If so, then you should seriously consider having your script write to a file then launch notepad if you want to see the list. Or just having your script copy them. It should copy them and get the names right, even though they look wrong.

    Incidentally character 133 is à in DOS CP 437. (This is the default DOS encoding.) And à is character 224 in Windows CP 1242 (Latin 1 - again a common default). Armed with that information, you hopefully can figure out the magic encoding invocation to do what you want. (I'd need to have Windows to figure it out and try it. I don't, so sorry.)

Re: Cannot copy files when filenames got French accentuated character
by jhoule (Initiate) on Nov 04, 2007 at 22:35 UTC
    Hi monks (specially tye && tilly),

    Sorry I took long to give you feed back, but I try a lot your suggetions, getting sometimes very close without never found the way to get trought that problem. I finally get a ISO certified head hack x:- {

    But after a good sleep (+ 1 hour for time changing), I did few last test and tought that this is really a Windows problem who have problem dealing with filenames with non-standard characters. I was about to write you all I've done since when ... Surprise! It works find?!?!?!

    I have been induced in error by corrupted files and not because they were containing special characters. And the trace I wrote make me tought I was right,... but I wasn't.

    I'm putting the code I wrote for those who are sceptical like I was and certify that I used it to copy files with French accents on Win2k sp4 and XP sp2, from FAT32 and/or NTFS to Fat32 and/or NTFS.

    ===========================================================================================
    #!/usr/bin/perl -w

     use Cwd;
     use Win32API::File 0.08 qw( :ALL );
     use Encode qw(from_to);


     sub ScanDirectory{
      my ($workdir) = shift;
      my ($prefix) = shift;
      my ($startdir) = &cwd;

      chdir($workdir) or die "Unable to enter dir $workdir:$!\n";
      opendir(DIR, ".") or die "Unable to open $workdir:$!\n";
      my @names = readdir(DIR) or die "Unable to read $workdir:$!\n";
      closedir(DIR);

      foreach my $name (@names) {

       next if ($name eq ".");
       next if ($name eq "..");

       push(@liste, $prefix."/".$name);
       if (-d $name) {
        &ScanDirectory($name, $prefix."/".$name, @liste);
        next;
       }
      }
      chdir($startdir);
     }

     @liste = ();
     my $sourcedir = "C:/A";
     my $destdir = "D:/B";

     #Getting the liste of files in source dir.
     ScanDirectory($sourcedir, "");
     @liste1 = sort @liste;
     @liste = ();

     #Getting the liste of files in destination dir.
     ScanDirectory($destdir, "");
     @liste2 = sort @liste;

     #Create a liste of files that are in source but not in destination dir.
     foreach my $l1_item (@liste1) {

      if (@liste2) {

       $l2_item = $liste2[0];

       if ("/i$l1_item" lt "/i$l2_item") {
        # Copier l1_item dans rep2
        push @liste3, $l1_item;
       }

       if ("/i$l1_item" eq "/i$l2_item") {
        shift @liste2;
       }

       if ("/i$l1_item" gt "/i$l2_item") {
        shift @liste2;
        unshift @liste1, $l1_item;
       }
      } else {
       # Copier l1_item dans rep2
       push @liste3, $l1_item;   # Copier l1_item dans rep2
      }
     }


     # Copy the missing files
     foreach my $item (@liste3) {
      if (-d $sourcedir.$item) {
       mkdir($destdir.$item) unless -e $destdir.$item;
      } else {
       CopyFile( $sourcedir.$item, $destdir.$item, 1 );
      }
     }

     printf("liste3 contient : ".@liste3."\n");
     foreach my $item (@liste3) {
      from_to($item, "iso 8859 1", "cp437");
      printf( $item."\n" );
     }
    ===========================================================================================

    Note :
     from_to($item, "iso 8859 1", "cp437");

    is use to see that at the screen :

    /New Text Document
    /New Text Document/La légende du cheval blanc.mp3
    /New Text Document/New Text Document.txt
    /New Text Document/pod2htmd.tmp
    /New Text Document/pod2htmi.tmp
    /Newtest.pl
    /Newtest.pl.exe
    /éäàåçêëèïîìÄûù
    /éäàåçêëèïîìÄûù/é
    /éäàåçêëèïîìÄûù/éäàåçêëèïîìÄûù.txt

    instead of :

    liste3 contient : 10
    /New Text Document
    /New Text Document/La lTgende du cheval blanc.mp3
    /New Text Document/New Text Document.txt
    /New Text Document/pod2htmd.tmp
    /New Text Document/pod2htmi.tmp
    /Newtest.pl
    /Newtest.pl.exe
    /TSastOdFne8-v·
    /TSastOdFne8-v·/T
    /TSastOdFne8-v·/TSastOdFne8-v·.txt

    Thank you very much!

    Jhoule