If you were using Perl 5.8, I'd suggest pushing an encoding layer when you opened the file (or after with binmode). As you're not, I won't.
Here's a quick script that reads a file line-by-line and uses pack to set the UTF-8 flag on each string read in. After that flag is set, character semantics work as expected for wide characters that were read in from the file:
use utf8; use CGI::Carp qw(fatalsToBrowser); print "Content-type: text/html; charset=utf-8\n\n"; open(FILE, "<", "/path/to/utf8/file.txt") || die "$!"; print "<pre>\n"; while(<FILE>) { chomp; $_ = set_utf($_); my $len = length($_); # count of chars not bytes print "$_", ' ' x (72 - $len), "|\n"; } print "</pre>\n"; sub set_utf { return pack "U0a*", join '', @_; }
I fashioned the script as a CGI script so that you can view the output in your browser - which understands UTF-8 characters (whereas your TTY might not). Given a UTF-8 text file with lines less than 80 characters, this should pad each line out to 80 characters with spaces and then append a '|'. If character semantics are not in force, the length will count bytes rather than characters and the '|'s won't line up.
In reply to Re: Setting UTF-8 mode on filehandle reads?
by grantm
in thread Setting UTF-8 mode on filehandle reads?
by jkahn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |