Hi,

I've read enough about encodings and collations and such to make my brain squishy. However, I still can't figure out how to solve my problem.

The problem is I need to compare character data in a way consistent with MySQL. I am comparing data in a table across two servers. Assume the tables have identical structures in every way, but may contain different data. I connect to each server and select col1 from each table. col1 is a varchar column with charset latin1 and collation latin1_swedish_ci (these are defaults). Now I loop through the rows returned from each table, comparing the data to determine which rows are extra or missing between the two tables.

MySQL sorts case-insensitive (hence latin1_swedish_ci) of course. I can duplicate this without problem: I just use "lc $a cmp lc $b".

This works okay until I come to a word like éclair. MySQL knows that this should be sorted just before "e". Perl thinks é is greater than "z".

I can go the other way, and make MySQL sort things the same way as Perl with "ORDER BY BINARY col1", but this defeats indexes and makes MySQL sort inefficiently, and that is highly undesirable for my purposes.

I feel sure there's a way to do this, but danged if I can figure out how. I have 8 Firefox tabs open with everything from perluniintro to Encode::Byte, and my brain is full :-) Anyone? Anyone?

Thank you!

UPDATE The idea to reach back to MySQL and ask it to compare the strings is how I'm going to try to do it. Thank you all very much for the great ideas!


In reply to locales, encodings, collations, charsets... how can I match a given MySQL collation? by xaprb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.