Hi all

I'm using Tie::RDBM with postgresql to store some data. My problem is that the values obrained back from the database does not have set unicode (utf8) flag and are treated as a octets instead of strings.

Yes, of course I can use $data = decode('utf8', $hash{'key'}) for each data (both keys and values) obtained from database, but... Is there better way to do this? Is there something like DBM filers (filter_fetch_key, filter_fetch_value)?

My testing code is here

#!/usr/bin/perl_parallel -w # For Emacs: -*- mode:cperl; mode:folding; -*- use strict; use utf8; use Tie::RDBM; use Encode; my %data; my $db_name = 'unicode'; my $db_host = 'localhost'; my $db_user = 'elric'; my $db_pass = 'test01'; tie(%data, 'Tie::RDBM', { db => "dbi:Pg:dbname=$db_name;host=$db_host;", user => $db_user, password => $db_pass, table => 'Demo', create => 1, drop => 1, autocommit => 1, DEBUG => 0 }) or die $!; my $counter = 0; open DATA, '<:utf8', 'input.utf8' or die $!; while (<DATA>) { chomp; my ($key, $value) = split(':', $_, 2); $data{$key} = $value; # BAD - print octets print $data{$key}, "\n"; # OK - print string print decode('utf8', $data{$key}), "\n"; $counter++; } close DATA; print 'number of keys witten: ', $counter, "\n"; print 'number of keys in database: ', scalar keys %data, "\n"; # BAD - print octets print join(', ', keys %data), "\n"; # OK - print strings print join(', ', map { decode('utf8', $_)} keys %data), "\n"; untie %data;

Thanks for your help


In reply to restore unicode data from database? by ph0enix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.