in reply to Setting UTF-8 mode on filehandle reads?
If you were using Perl 5.8, I'd suggest pushing an encoding layer when you opened the file (or after with binmode). As you're not, I won't.
Here's a quick script that reads a file line-by-line and uses pack to set the UTF-8 flag on each string read in. After that flag is set, character semantics work as expected for wide characters that were read in from the file:
use utf8; use CGI::Carp qw(fatalsToBrowser); print "Content-type: text/html; charset=utf-8\n\n"; open(FILE, "<", "/path/to/utf8/file.txt") || die "$!"; print "<pre>\n"; while(<FILE>) { chomp; $_ = set_utf($_); my $len = length($_); # count of chars not bytes print "$_", ' ' x (72 - $len), "|\n"; } print "</pre>\n"; sub set_utf { return pack "U0a*", join '', @_; }
I fashioned the script as a CGI script so that you can view the output in your browser - which understands UTF-8 characters (whereas your TTY might not). Given a UTF-8 text file with lines less than 80 characters, this should pad each line out to 80 characters with spaces and then append a '|'. If character semantics are not in force, the length will count bytes rather than characters and the '|'s won't line up.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re:^2 Setting UTF-8 mode on filehandle reads?
by ph0enix (Friar) on Dec 06, 2002 at 12:32 UTC | |
by grantm (Parson) on Dec 06, 2002 at 18:14 UTC |