The question marks indicate that something has tried (or is trying) to treat the cp1256 characters as utf8, and failing. That is, a string of cp1256 bytes has been (is being) given to some tool or function that expects some form of unicode (probably utf8), and since the cp1256 string cannot be mistaken for valid utf8 multi-byte ("wide") characters, the result is a string of "????", to reflect that the byte sequence is not parsable as utf8.

The code you've shown gives no evidence that this sort of misinterpretation would happen. So the problem involves something you haven't told us about yet.

Is there anything else different between the two data versions besides the schema? (Different database, different server, different machine, different OS?) When you say "with the same viewer", what is that, and is it really the same executable on the same machine showing both versions of the data?

(Update: By any chance, is your perl script running on Red Hat 9 with perl 5.8.0? If so, check your locale setting -- this might be "coercing" the data to utf8 for you "by default" in some way.)

Have you tried using the "mysqlimport" tool, or the "LOAD DATA INFILE" operation, instead of doing inserts with DBI? Dumping the names from the old schema to a plain-text (cp1256) file, and then using that text file with mysqlimport, would at least be quicker than doing a series of DBI inserts (maybe not noticeable over a small data set, but if you have more than a few thousand rows, you'll see a speed difference).

But apart from efficiency, you also have the ability to check the data coming out of the old schema, and if you dump the data from the new schema the same way to a distinct text file, you'll be able to tell right away whether the load/retrieval on the new table is the problem (as opposed to the display of the data), just by comparing the two plain-text files.

(update: fixed a mis-spelling of "mysqlimport")


In reply to Re: Trouble with Perl MySQL Arabic by graff
in thread Trouble with Perl MySQL Arabic by cormanaz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.