poltr1 has asked for the wisdom of the Perl Monks concerning the following question:
(Actually, this isn't a question, but a solution in case others are looking for the same nugget of wisdom.)
I have incoming data in XML that's UTF-8 encoded. It includes special symbols such as nonbreaking spaces, the registered trademark symbol (R), the "TM" symbol, Greek letters, etc. This data is being put into an Oracle database via script that reads the XML, parses it using XML::Twig, and saves it to the database via DBI. In order to preserve these characters, here's what I had to do:
1) Add this line to Perl scripts:
use encoding "utf-8"; # For text containing I +SO-8859-1 and/or UTF-8 characters
2) Add the encoding qualifier to any 'open' statements on files that contain UTF-8:
open (OUT, ">:encoding(UTF-8)", ">$outfile") || die "Unable to open $o +utfile for writing.";
3) Add this line if interfacing with an Oracle database:
$ENV{NLS_LANG} = 'AMERICAN_AMERICA.AL32UTF8'; # Handle UTF-8 charac +ters
4) Modify the Oracle login to include the 'ora_charset => "UTF8"' attribute in the connect string:
my $dbh = DBI->connect("dbi:Oracle:$DATABASE", $USERNAME, $PASSWORD, { RaiseError => 1, AutoCommit => 0, ora_charset => "UTF8" }) or die ($DBI::errstr . "\n");
5) If processing XML files with encoding="UTF-8" via XML::Twig, add the output_encoding attribute to the call:
$twig = XML::Twig->new( twig_handlers => $handlers, output_encoding => + 'utf-8' );
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Solved: Preserving UTF-8 characters in Oracle and XML
by poltr1 (Novice) on Nov 09, 2011 at 15:37 UTC | |
|
Re: Solved: Preserving UTF-8 characters in Oracle and XML
by mje (Curate) on Nov 10, 2011 at 15:55 UTC |