frasco has asked for the wisdom of the Perl Monks concerning the following question:
It gives me back the following output:#!/usr/bin/perl -w use strict; use warnings; use DBI; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser); use utf8; use Encode qw(decode decode); binmode(STDOUT, ":encoding(utf8)"); my ($datasource, $user, $passw, $dbh, $sth); my ($id_testo, $indice, $parole, $posizione); my (@row, $field); $datasource = "DBI:mysql:database=Test;host=xxxxxxx;"; $user = "xxxxxxxx"; $passw = ""; $dbh = DBI->connect($datasource, $user, $passw) || die "Error opening +db: $DBI::errstr\n"; $dbh->do("SET NAMES 'utf8'"); $sth = $dbh->prepare("SELECT indice, GROUP_CONCAT(parole SEPARATOR ' ' +) FROM testo GROUP BY indice"); $sth->execute(); print header(-type => "text/html", -charset => "utf-8"), start_html(-encoding => 'utf-8', "My_database"), "\n", h2("ARET 1.1"), "\n"; while (@row = $sth->fetchrow_array) { for $field(@row) { $field =~ s/([à]+)/<i>$1<\/i>/g; # lower --> italic $field =~ s/(\p{Lu}+)/lc($1)/ge; # upper --> lower $field =~ s/-=(.{1,4})/<sup>$1<\/sup>/g; # OK } print p(), decode("utf8", "$row[0]\t$row[1]\n"); } $sth->finish(); $dbh->disconnect() || die "fallita disconnessione\n";
.. various html tag and meta ..Well, I really don't understand why the substitution regex doesn't work with unicode character such as accented wovel à (or Ú, or even the sign ʾ). I tried and change the à with its corresponding x{2be}, but nothing happens. My data originate from a MySQL table set with a utf8 charset. Is it possible that I didn't yet decoded my output when I send it to the for loop and to the sobstutions regex? Thank your.1,1 1 ʾà-da-um-=TUG2-II 1 AKTUM-=TÚG 1 IB2-IV-=TÚG SA₆ DAR
r.1,2 g_*NI-ra-ar-=KI
r.1,3 2 ʾa3-da-um-=TÚG-II 1 ʾa3-da-um-=TÚG-I
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: substitution regex and unicode
by Joost (Canon) on May 02, 2008 at 21:30 UTC | |
by frasco (Beadle) on May 03, 2008 at 10:09 UTC | |
by ikegami (Patriarch) on May 03, 2008 at 10:45 UTC | |
by frasco (Beadle) on May 07, 2008 at 18:40 UTC | |
by ikegami (Patriarch) on May 07, 2008 at 22:41 UTC |