in reply to Re^4: Mugged by UTF8, this CANNOT be right
in thread Mugged by UTF8, this CANNOT be right

On Windows (and UNIX if built for unicode) DBD::ODBC reads UCS2 encoded data from the MS Access database though the ODBC driver manager and MS Access ODBC Driver. DBD::ODBC then decodes that and reencodes it UTF-8 such that Perl sees the strings as unicode. As a result, I believe the decode has been done for you.

  • Comment on Re^5: Mugged by UTF8, this CANNOT be right

Replies are listed 'Best First'.
Re^6: Mugged by UTF8, this CANNOT be right
by Jim (Curate) on Jan 27, 2011 at 16:54 UTC

    Thank you, ++mje! This is very helpful to know. And it's authoritative, too, as it's coming straight from the maintainer of DBD::ODBC.

    Your post prompted me to reread the documentation of DBD::ODBC's Unicode support more carefully. Among the wealth of detailed information about Unicode and various different drivers for different RDBMSes, the documentation does state:

    DBD::ODBC uses the wide character versions of the ODBC API and the SQL_WCHAR ODBC type to support unicode in Perl.
    Wide characters returned from the ODBC driver will be converted to UTF-8 and the perl scalars will have the utf8 flag set (by using sv_utf8_decode).
    perl scalars which are UTF-8 and are sent through the ODBC API will be converted to UTF-16 and passed to the ODBC wide APIs or signalled as SQL_WCHARs (e.g., in the case of bound columns).

    I think it might be helpful to have an entry in the DBD::ODBC FAQ like "How Do I Handle Unicode Text With MS Access?" that simply and plainly explains that, mostly, it should Just Work. (Shouldn't it?)

      On Windows, it should just work (there are a few exceptions for old ODBC 1 and 2 drivers). On UNIX it is harder as the support for unicode in ODBC drivers and ODBC driver managers differs - which is why it is an optional build setting on UNIX. I'll consider a FAQ entry but perhaps more generally than the one you propose or I'll end up adding loads, one per driver/database.