When DBI pulls text data out of your database, the data will be treated as "bytes", not as characters -- because Perl has no way of knowing what sort of character encoding has been stored in the database.

So the data coming out of the database is a set of "octets", and needs to be "decoded" into a utf8 string within your perl script. If DBI has given you a hash keyed by column name, then:

# I'm sure you have a very different (more sensible) way of # mapping table values to their proper legacy encodings, but # this is just to show how to handle the data: my %column_enc_map = ( columnA => 'cp1250', columnB => 'cp1251', # or whatever... ); for my $field ( keys %column_enc_map ) { # replace the hash values from the database with utf8 strings: $database{$field} = decode( $column_enc_map{$field}, $database{$fi +eld} ); } # %database values are now in utf8; you can load them back to the data +base via updates
Looking at your later reply in this thread, I'm pretty sure you don't need the extra "encode()" step on top of the "decode". All that does is turn off the utf8 flag on the string, which is kind of pointless, I think.

In reply to Re^2: Convert database into UTF-8 by graff
in thread Convert database into UTF-8 by salonmonk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.