As indicated in my note below (which was a reply to the anonymous OP), you have to stop talking about cp1252 (Latin1), and switch to cp1251 (Cyrillic) instead -- that alone might account for some of the problems you are having.

Apart from that, if the data originally comes from a file (or other external source) as utf8 text, your perl script first has to be made aware that it is utf8 data, either via  open($fh,"<:utf8",$file), or via $utf8_string=decode('utf8',$input_string).

Then you encode( "cp1251", $utf8_string ) and use the resulting string as the input to your non-unicode database. On getting stuff back from the database, do $utf8_string=decode("cp1251", $db_string) to get back to your original utf8 Cyrillic string.

But if the original utf8 Cyrillic string included any character(s) that do not exist in the cp1251 character set, those things will not survive the conversion into cp1251, period.

In that case, you'll need to replace the "unmappable" characters in question with suitable substitutes, if possible, and that will probably involve some manual inspection and decisions about what sort of replacement(s) would be suitable...

(updated in hopes of making things clearer)


In reply to Re^5: Character encoding fun... by graff
in thread Character encoding fun... by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.