davepreece has asked for the wisdom of the Perl Monks concerning the following question:

I am using a perl script to insert some data into a mySQL database, however when ever I insert a £ sign it comes out in mySQL as £. I am sure this is something to do with the character set being used but do not how how to change this in either the connection from perl or in mySQL. Any help would be greatly appreciated.

Replies are listed 'Best First'.
Re: Perl mySQL and £ signs
by graff (Chancellor) on Sep 18, 2005 at 20:09 UTC
    I think the perl script has the character as a utf8 byte sequence, and that is what is going into the database. No problem with that.

    When you say "it comes out in mysql as ...", I expect you mean something like "when I use an xterm or an MS-DOS prompt window or some other thing where I can run mysql and issue queries interactively, the display I get is ...".

    This would be expected assuming that your xterm or MS-DOS prompt window or whatever is displaying the data as if it were ISO-8859-1 or CP-1251 or some similar single-byte legacy character set. If you had a utf8-capable display window where you could see the output from mysql (e.g. a browser), it would show the utf8-encoded character correctly.

    (update: just to be clear -- it's a display issue, not a data or perl issue per se)

    (another update -- sorry I didn't happen to notice this sooner: if you set your browser to UTF8 and look at your own post, you'll see the single-character currency mark that I presume you intended to have in the database.)

      I think what's happening is that the original string is a Unicode string, then it gets stored in MySQL and retrieved again.

      What comes back is a utf-8 encoded string, but it's not marked as a Unicode string in perl.

      What you need to do is help perl out by marking the string returned by DBI/mysql as a Unicode string:

      use Encode qw(_utf8_on); _utf8_on($field_value_returned_by_dbi);
        Good point -- thanks. Though I might prefer the "publicly supported" method for doing that "conversion" (turning on the utf8 flag) -- one that doesn't involve a function whose initial underscore in the name asserts that it is intended to be "private" within Encode.pm (hence subject to change in a later release):
        use Encode; ... my $utf8_string = decode( 'utf8', $field_value_from_db );
        Another point I should have considered earlier is that the OP might not want utf8 to be stored in the database (because of its "extra bulk"). If so, it would be better for the perl code that loads data into mysql to apply the appropriate encoding first (assuming the OP knows which non-unicode character set is suitable) -- e.g.:
        my $field_value_for_db = encode( 'iso-8859-1', $utf8_string );
        This way, nothing special needs to be done when pulling the data back out of the database for a non-utf8 display.