Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

restore unicode data from database?

by ph0enix (Friar)
on Nov 29, 2002 at 14:29 UTC ( [id://216530]=perlquestion: print w/replies, xml ) Need Help??

ph0enix has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I'm using Tie::RDBM with postgresql to store some data. My problem is that the values obrained back from the database does not have set unicode (utf8) flag and are treated as a octets instead of strings.

Yes, of course I can use $data = decode('utf8', $hash{'key'}) for each data (both keys and values) obtained from database, but... Is there better way to do this? Is there something like DBM filers (filter_fetch_key, filter_fetch_value)?

My testing code is here

#!/usr/bin/perl_parallel -w # For Emacs: -*- mode:cperl; mode:folding; -*- use strict; use utf8; use Tie::RDBM; use Encode; my %data; my $db_name = 'unicode'; my $db_host = 'localhost'; my $db_user = 'elric'; my $db_pass = 'test01'; tie(%data, 'Tie::RDBM', { db => "dbi:Pg:dbname=$db_name;host=$db_host;", user => $db_user, password => $db_pass, table => 'Demo', create => 1, drop => 1, autocommit => 1, DEBUG => 0 }) or die $!; my $counter = 0; open DATA, '<:utf8', 'input.utf8' or die $!; while (<DATA>) { chomp; my ($key, $value) = split(':', $_, 2); $data{$key} = $value; # BAD - print octets print $data{$key}, "\n"; # OK - print string print decode('utf8', $data{$key}), "\n"; $counter++; } close DATA; print 'number of keys witten: ', $counter, "\n"; print 'number of keys in database: ', scalar keys %data, "\n"; # BAD - print octets print join(', ', keys %data), "\n"; # OK - print strings print join(', ', map { decode('utf8', $_)} keys %data), "\n"; untie %data;

Thanks for your help

Replies are listed 'Best First'.
Re: restore unicode data from database?
by graff (Chancellor) on Dec 01, 2002 at 23:12 UTC
    The man page for the "Encode" module in Perl 5.8.0 points out the following, under the heading "The UTF-8 flag", sub-heading "Messing with Perl's internals":
    The following API uses parts of Perl's internals in the current implementation. As such, they are efficient but may change. ... _utf8_on(STRING) [INTERNAL] Turns on the UTF-8 flag in STRING. The data in STRING is not checked for being well-formed UTF-8. Do not use unless you know that the STRING is well-formed UTF-8. Returns the previous state of the UTF-8 flag (so please don't treat the return value as indicating success or failure), or "undef" if STRING is not a string.
    I'm not saying that this is a good alternative to the work-around that you are already using. I have a hunch that anything else, that would actually treat the RDBM as a utf-8 source, would require mucking with the DBD or Tie:RDBM module internals, and would be problematic, since a database (and any interface to it) needs to be flexible about handling many types of non-ASCII data -- not just unicode characters.

    Personally, I'd be content with the method you are already using.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://216530]
Approved by mr2
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-24 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found