tannx has asked for the wisdom of the Perl Monks concerning the following question:

I need to encode files to utf-16 and then move to destination. Moving files is done but how to encode the files?
#!/usr/bin/perl use warnings; use strict; use File::Copy; my $srcdir = "C:\\ROOT_DIR\\1\\"; my $dest = "C:\\ROOT_DIR\\2\\"; my (@files); for (;;) { opendir(DIR, $srcdir) or die "Can't open $srcdir: $!"; @files = grep {!/^\.+$/} readdir(DIR); close(DIR); if (!@files) { print "Done.\n\n"; last; } my $file = $files[0]; my $old = "$srcdir/$file"; move($old, $dest) or die "Move $old -> $dest failed: $!"; print "File Name: $file 5 seconds til next.\n\n"; sleep 5; }
I found script to encode files to utf-16
#!/usr/bin/perl use strict; use warnings; binmode(STDOUT, ':raw:encoding(UTF-16)'); for my $qfn (@ARGV) { open(my $fh, "<:raw:encoding(UTF-8)", $qfn) or die("Can't open \"$qfn\": $!\n"); print while <$fh>; }
I'm unable to combine those scripts together.

Replies are listed 'Best First'.
Re: encode files to utf-16 and then move
by moritz (Cardinal) on May 08, 2009 at 08:39 UTC
    Instead of moving the file, you open the source for reading, and the destination for writing, and print the line to the new file, just like in the conversion script. If all went well, remove the source file.

    It's not hard, just don't be afraid of touching the code.

Re: encode files to utf-16 and then move
by ikegami (Patriarch) on May 08, 2009 at 16:28 UTC

    heh, I bet I wrote that code. One thing I learned since then is that :raw:encoding(...) disables buffering. You want :raw:perlio:encoding(...).

    You probably want to specify UTF-16le instead of UTF-16 since you probably want UTF-16le and not UTF-16be.

    The snippet is already producing a copy, so all you need to do to make it move is to delete the source once the copy is created.

    #!/usr/bin/perl use strict; use warnings; my ($src_qfn, $dst_qfn) = @ARGV; open(my $src_fh, "<:raw:perlio:encoding(UTF-8)", $src_qfn) or die("Can't open \"$src_qfn\": $!\n"); open(my $dst_fh, ">:raw:perlio:encoding(UTF-16le)", $dst_qfn) or die("Can't open \"$src_qfn\": $!\n"); print $dst_fh $_ while <$src_fh>; unlink($src_fh);
    • Addition of error checking left to you.
    • Removal of a BOM if one exists (if so desired) is left to you.
    • Addition of a BOM if none exist (if so desired) is left to you.
      Right now it processes only one file.
      #!/usr/bin/perl use warnings; use strict; my $srcdir = "C:\\ROOT_DIR\\test1\\"; my $dest = "C:\\ROOT_DIR\\test2\\"; my (@files); for (;;) { opendir(DIR, $srcdir) or die "Can't open $srcdir: $!"; @files = grep {!/^\.+$/} readdir(DIR); close(DIR); if (!@files) { print "done.\n\n"; last; } my $file = $files[0]; open(my $src_fh, "<:raw:perlio:encoding(UTF-8)", "$srcdir$file") or die("Can't open \"$srcdir$file\": $!\n"); open(my $dst_fh, ">:raw:perlio:encoding(UTF-16)", "$dest$file") or die("Can't open \"$dest$file\": $!\n"); print $dst_fh $_ while <$src_fh>; unlink($src_fh); sleep 1; }
        Moved reading the directory to the outside of the loop and some other cleanup:
        #!/usr/bin/perl use warnings; use strict; use File::Spec::Functions qw( catfile ); my $src_dir = "C:\\ROOT_DIR\\test1\\"; my $dst_dir = "C:\\ROOT_DIR\\test2\\"; opendir(my $dh, $srcdir) or die "Can't open dir $srcdir: $!\n"; while (defined(my $file = readdir($dh))) { next if /^\.\.?\z/; my $src_file = catfile($src_dir, $file); my $dst_file = catfile($dst_dir, $file); open(my $src_fh, "<:raw:perlio:encoding(UTF-8)", $src_file) or die("Can't open \"$src_file\": $!\n"); open(my $dst_fh, ">:raw:perlio:encoding(UTF-16)", $dst_file) or die("Can't open \"$dst_file\": $!\n"); print $dst_fh $_ while <$src_fh>; unlink($src_fh); } print "done.\n";
        close $dst_fh or die qq!Failed close "$dest$file" :$!: $^E!; close $src_fh or die qq!Failed close "$srcdir$file": $!: $^E!; unlink "$srcdir$file" or die qq!Failed unlink "$srcdir$file": $!: $^E! +; sleep 1;
      You right I need utf-16le but if i define output utf-16le then the file becomes unusable - 1NUL;NUL;2NUL;testNUL; etc. If defined UTF-16 then output is - 1;;2;test; but big endian. ----- It's probably because may source files are utf8 wo BOM and destinatin files are also without BOM.