plwtoday has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks,

I am working on a project wherein, for starters, I simply want to read a Greek file formatted in Unicode (UTF-8) and write it to another file. (Later I will manipulate the data.)

I use the Ultra-Edit editor (on an Acer laptop running Windows 7) and the Greek file is readable in it.

The input file is named "WE_EX.txt" and the output file is named "WE_EX.out."

The code below appears to open both the input and output files but then goes into an infinite loop...

Any help you may provide will be so greatly appreciated!

All best wishes, Patricia Walters

#! # Simply read Greek file (WE_EX.txt) saved in # UNICODE (UTF-8) format and # write it to output file (WE_EX.out) # use strict; use warnings; # open IN, '<:encoding(UTF-8)', "WE_EX.txt" or die "Can't open file WE_EX.txt for reading: $!"; # open OUT, ">WE_EX.out" or die "Can't open file WE_EX.out for writing: $!"; # # my @string; while (<>) { push @string, $_; } print OUT "@string"; close IN; close OUT;

Replies are listed 'Best First'.
Re: How to read in and write out Unicode (UTF-8) file in Greek
by ikegami (Patriarch) on Jun 06, 2011 at 18:20 UTC

    You decoded the characters on input, but you didn't encode them on output.

    open IN, '<:encoding(UTF-8)', "WE_EX.txt" open OUT, '>:encoding(UTF-8)', "WE_EX.out"

    You don't have a infinite loop. The program is waiting for input from the keyboard because you are reading from ARGV instead of IN.

    while (<IN>) {

    You are needlessly using global variables for your file handles.

    use strict; use warnings; open(my $IN, '<:encoding(UTF-8)', 'WE_EX.txt') or die "Can't open file WE_EX.txt for reading: $!"; open(my $OUT, '>:encoding(UTF-8)', 'WE_EX.out') or die "Can't open file WE_EX.out for writing: $!"; while (<$IN>) { # ... Manipulate $_ ... print $OUT $_; } close $IN; close $OUT;
      Some people might say it is OK to use global variables for filehandles. I think the my $IN notation is confusing for noobs, and many examples in tutorials use global variables for filehandles. One thing for sure, they won't break your program (although they could potentially lead to hard-to-find bugs if you write a complex program with different libraries or subroutines.)