Maybe I'm tired but really I can't solve such a situation! I have the following script retrieving data from MySQL:
#!/usr/bin/perl -w use strict; use warnings; use DBI; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser); use utf8; use Encode qw(decode decode); binmode(STDOUT, ":encoding(utf8)"); my ($datasource, $user, $passw, $dbh, $sth); my ($id_testo, $indice, $parole, $posizione); my (@row, $field); $datasource = "DBI:mysql:database=Test;host=xxxxxxx;"; $user = "xxxxxxxx"; $passw = ""; $dbh = DBI->connect($datasource, $user, $passw) || die "Error opening +db: $DBI::errstr\n"; $dbh->do("SET NAMES 'utf8'"); $sth = $dbh->prepare("SELECT indice, GROUP_CONCAT(parole SEPARATOR ' ' +) FROM testo GROUP BY indice"); $sth->execute(); print header(-type => "text/html", -charset => "utf-8"), start_html(-encoding => 'utf-8', "My_database"), "\n", h2("ARET 1.1"), "\n"; while (@row = $sth->fetchrow_array) { for $field(@row) { $field =~ s/([à]+)/<i>$1<\/i>/g; # lower --> italic $field =~ s/(\p{Lu}+)/lc($1)/ge; # upper --> lower $field =~ s/-=(.{1,4})/<sup>$1<\/sup>/g; # OK } print p(), decode("utf8", "$row[0]\t$row[1]\n"); } $sth->finish(); $dbh->disconnect() || die "fallita disconnessione\n";
It gives me back the following output:
.. various html tag and meta ..

r.1,1 1 ʾà-da-um-=TUG2-II 1 AKTUM-=TÚG 1 IB2-IV-=TÚG SA₆ DAR

r.1,2 g_*NI-ra-ar-=KI

r.1,3 2 ʾa3-da-um-=TÚG-II 1 ʾa3-da-um-=TÚG-I

Well, I really don't understand why the substitution regex doesn't work with unicode character such as accented wovel à (or Ú, or even the sign ʾ). I tried and change the à with its corresponding x{2be}, but nothing happens. My data originate from a MySQL table set with a utf8 charset. Is it possible that I didn't yet decoded my output when I send it to the for loop and to the sobstutions regex? Thank you

In reply to substitution regex and unicode by frasco

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.