I usually like to say: The correct way of handling encodings in Perl is not caring about. If you're caring too much, you're doing the wrong way...
The only two things you need to do to work properly with whatever-encoding in Perl is:
The match of accented characters in regexps doesn't have nothing to do with encoding at all, just with locale, so, if your locale is set correctly, then the match will work, in whatever-encoding.
This way, the code you sent would be like the following (I included some more CGI code to exemplify your case).
Hope this helps... Update: I missed "-type => " in the first version...use strict; use warnings; use CGI; # this tells my source file is UTF-8 use utf8; # the latin accented characters are valid # for this locale, for instance. BEGIN { $ENV{LC_CTYPE} = 'pt_BR' } # tell Perl I want it to consider that use locale; # The good thing about CGI is that it already # honor the input encoding, so you don't need # to care. my $cgi = CGI->new(); my $string = q( éáaíóúÁAÉÍÓÚ ); # this match works because of the use locale, # not because of encodings... $string =~ s/Á/b/g; # now two important things: # the first is to tell Perl that your STDOUT # is utf8 (this may not be the default depending # on the operating system, the environment and a # lot of other stuff). So it's better to do it # explicitly. binmode STDOUT, ':utf8'; # The second is to properly say that to the browser # (this is actually HTTP specific, not exactly Perl # related, but, as you said you're working with CGI # I decided to cite here). print $cgi->header(-type => 'text/plain', -charset => 'utf-8'); # then the string will be printed correctly print $string;
In reply to Re: utf8, locale and regexp
by ruoso
in thread utf8, locale and regexp
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |