It looks like the "ó" character is probably being stored in the database as a two-byte utf8 "wide" character. It turns out that when ó is encoded in utf8, the two-byte sequence is "\xC3\xB3"; you can look those up in a Latin1 chart and see that if these two bytes were treated as separate characters, they would come out as "Ã" and "superscript 3".

So somewhere in your setup, you are storing strings in utf8, and then somewhere else, you are treating them as if they were not utf8, but rather some single-byte encoding such as iso-8859-1 or cp1252.

You haven't given us enough information to tell where the problem is. Maybe the utf8 string is contained in the web page that you fetch, and is being stored in the database as the two-byte sequence. When you read that back from the database, the two-byte utf8 character might be getting displayed "as-is" on a 8859-1 or cp1252 display, or it could be that the two bytes are each being "upgraded" to utf8 characters and you're seeing à and superscript-3 on a utf8 display.

Whatever the problem, you just need to be explicit about what encoding is being used at each step of your process, and maybe do some encoding "conversions" at appropriate points.

If the database contains utf8 strings, and you use a Perl script to read stuff back from the database, Perl probably won't be able to know automatically that the string contains utf8 "wide" characters, and you'll need to use Encode to make that explicit:

use Encode; # assume the $string contains a value fetched from the database: $string = decode( "utf8", $string ); # sets the "utf8-flag" on $strin +g;
If that doesn't help, and you can't figure out what really needs to be done, you'll need to give us more information: What OS are you using, and are you using a utf8-based locale? What are you using to view the text data? Can you confirm whether the string is being stored in the database as utf8?

In reply to Re: Fine when written to text file, but unreadable when written to database table by graff
in thread Fine when written to text file, but unreadable when written to database table by Kanishka

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.