unicode in perl

evian has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm writing a program to parse latex documents that contain greek alphabets. I tried using regex to convert "\alpha" to "\x{03B1}" like this:

...
use utf8;
my $text = "\\alpha";
$text =~ s/\\alpha/\x{03B1}/g;
print $text;
...
[download]

however, I keep getting outputs that suggest perl isn't really using unicode. Any idea why? Thanks!

Comment on unicode in perl Download Code

Replies are listed 'Best First'.
Re: unicode in perl by pid (Monk) on Aug 03, 2011 at 03:24 UTC
`use utf8;` doesn't mean that your perl(1) is going to use UTF-8 for input/output. From utf8 (and please read it): Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. As Khen1950fx pointed out, you should use `binmode` for outputting these 'wide chars'. I suggest you read one of brian's informative/helpful posts and be sure to check out tchrist's OSCON Perl Unicode Slides. Concern the perldoc when not sure. HTH.	[reply] [d/l] [select]
Re: unicode in perl by Khen1950fx (Canon) on Aug 03, 2011 at 00:58 UTC
Works for me: `#!/usr/bin/perl use strict; use warnings; my $text = "\\alpha"; $text =~ s/\\alpha/\x{03B1}/g; binmode STDOUT, ':encoding(utf8)'; print "$text\n";` [download]	[reply] [d/l]