I usually like to say: The correct way of handling encodings in Perl is not caring about. If you're caring too much, you're doing the wrong way...

The only two things you need to do to work properly with whatever-encoding in Perl is:

The match of accented characters in regexps doesn't have nothing to do with encoding at all, just with locale, so, if your locale is set correctly, then the match will work, in whatever-encoding.

This way, the code you sent would be like the following (I included some more CGI code to exemplify your case).

use strict; use warnings; use CGI; # this tells my source file is UTF-8 use utf8; # the latin accented characters are valid # for this locale, for instance. BEGIN { $ENV{LC_CTYPE} = 'pt_BR' } # tell Perl I want it to consider that use locale; # The good thing about CGI is that it already # honor the input encoding, so you don't need # to care. my $cgi = CGI->new(); my $string = q( éáaíóúÁAÉÍÓÚ ); # this match works because of the use locale, # not because of encodings... $string =~ s/Á/b/g; # now two important things: # the first is to tell Perl that your STDOUT # is utf8 (this may not be the default depending # on the operating system, the environment and a # lot of other stuff). So it's better to do it # explicitly. binmode STDOUT, ':utf8'; # The second is to properly say that to the browser # (this is actually HTTP specific, not exactly Perl # related, but, as you said you're working with CGI # I decided to cite here). print $cgi->header(-type => 'text/plain', -charset => 'utf-8'); # then the string will be printed correctly print $string;
Hope this helps... Update: I missed "-type => " in the first version...
daniel

In reply to Re: utf8, locale and regexp by ruoso
in thread utf8, locale and regexp by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.