in reply to Perl mySQL and £ signs

I think the perl script has the character as a utf8 byte sequence, and that is what is going into the database. No problem with that.

When you say "it comes out in mysql as ...", I expect you mean something like "when I use an xterm or an MS-DOS prompt window or some other thing where I can run mysql and issue queries interactively, the display I get is ...".

This would be expected assuming that your xterm or MS-DOS prompt window or whatever is displaying the data as if it were ISO-8859-1 or CP-1251 or some similar single-byte legacy character set. If you had a utf8-capable display window where you could see the output from mysql (e.g. a browser), it would show the utf8-encoded character correctly.

(update: just to be clear -- it's a display issue, not a data or perl issue per se)

(another update -- sorry I didn't happen to notice this sooner: if you set your browser to UTF8 and look at your own post, you'll see the single-character currency mark that I presume you intended to have in the database.)

Replies are listed 'Best First'.
Re^2: Perl mySQL and £ signs
by saintmike (Vicar) on Sep 19, 2005 at 04:25 UTC
    I think what's happening is that the original string is a Unicode string, then it gets stored in MySQL and retrieved again.

    What comes back is a utf-8 encoded string, but it's not marked as a Unicode string in perl.

    What you need to do is help perl out by marking the string returned by DBI/mysql as a Unicode string:

    use Encode qw(_utf8_on); _utf8_on($field_value_returned_by_dbi);
      Good point -- thanks. Though I might prefer the "publicly supported" method for doing that "conversion" (turning on the utf8 flag) -- one that doesn't involve a function whose initial underscore in the name asserts that it is intended to be "private" within Encode.pm (hence subject to change in a later release):
      use Encode; ... my $utf8_string = decode( 'utf8', $field_value_from_db );
      Another point I should have considered earlier is that the OP might not want utf8 to be stored in the database (because of its "extra bulk"). If so, it would be better for the perl code that loads data into mysql to apply the appropriate encoding first (assuming the OP knows which non-unicode character set is suitable) -- e.g.:
      my $field_value_for_db = encode( 'iso-8859-1', $utf8_string );
      This way, nothing special needs to be done when pulling the data back out of the database for a non-utf8 display.