evian has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm writing a program to parse latex documents that contain greek alphabets. I tried using regex to convert "\alpha" to "\x{03B1}" like this:

... use utf8; my $text = "\\alpha"; $text =~ s/\\alpha/\x{03B1}/g; print $text; ...

however, I keep getting outputs that suggest perl isn't really using unicode. Any idea why? Thanks!

Replies are listed 'Best First'.
Re: unicode in perl
by pid (Monk) on Aug 03, 2011 at 03:24 UTC

    use utf8; doesn't mean that your perl(1) is going to use UTF-8 for input/output.

    From utf8 (and please read it):

    Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.
    As Khen1950fx pointed out, you should use binmode for outputting these 'wide chars'.

    I suggest you read one of brian's informative/helpful posts and be sure to check out tchrist's OSCON Perl Unicode Slides.

    Concern the perldoc when not sure.

    HTH.

Re: unicode in perl
by Khen1950fx (Canon) on Aug 03, 2011 at 00:58 UTC
    Works for me:
    #!/usr/bin/perl use strict; use warnings; my $text = "\\alpha"; $text =~ s/\\alpha/\x{03B1}/g; binmode STDOUT, ':encoding(utf8)'; print "$text\n";