Reading an Unicode File

donno20 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I read an unicode file and just print out its content. but the while loop just stop somewhere in the middle of the file. If I changed the file to ASCII, then the while loop can loop throughly. here is my simple code:

open RS, "< $rs_file";
while (my $line = <RS>){
print "$. $line";
}
close RS;
[download]

Any input would be greatly appreciated.

Comment on Reading an Unicode File Download Code

Replies are listed 'Best First'.
Re: Reading an Unicode File by jmcnamara (Monsignor) on Apr 23, 2003 at 11:27 UTC
If you are using Windows you should binmode the filehandle, this will avoid interpretation of ^Z (ASCII 26) as the end of file character: `open RS, $rs_file or die "Error message here: $!\n"; binmode RS; while (<RS>) { print $., "\t", $_; }` [download] -- John.	[reply] [d/l]
Re: Reading an Unicode File by John M. Dlugosz (Monsignor) on Apr 23, 2003 at 16:04 UTC
What do you mean by "A(n) Unicode file"? UTF-8, UTF-16, UCS-2, or what? The `<RS>` construct, without further telling it otherwise, will work for UTF-8 but not the others. In UTF-8 you don't need "binmode" either. If the file is using some other encoding, you need to set the input record separator to the proper byte sequence, and also use binmode. In Perl 5.8, there is built-in support for reading files in other encodings. You can use the extended open syntax to specify, and all should work fine without further intervention. —John	[reply] [d/l]
Re: Reading an Unicode File by donno20 (Sexton) on Apr 24, 2003 at 02:17 UTC
Thanks monks, I made it !! ^_^	[reply]