Re^5: Mugged by UTF8, this CANNOT be right

On Windows (and UNIX if built for unicode) DBD::ODBC reads UCS2 encoded data from the MS Access database though the ODBC driver manager and MS Access ODBC Driver. DBD::ODBC then decodes that and reencodes it UTF-8 such that Perl sees the strings as unicode. As a result, I believe the decode has been done for you.

Comment on Re^5: Mugged by UTF8, this CANNOT be right

Replies are listed 'Best First'.
Re^6: Mugged by UTF8, this CANNOT be right by Jim (Curate) on Jan 27, 2011 at 16:54 UTC
Thank you, ++mje! This is very helpful to know. And it's authoritative, too, as it's coming straight from the maintainer of DBD::ODBC. Your post prompted me to reread the documentation of DBD::ODBC's Unicode support more carefully. Among the wealth of detailed information about Unicode and various different drivers for different RDBMSes, the documentation does state: DBD::ODBC uses the wide character versions of the ODBC API and the SQL_WCHAR ODBC type to support unicode in Perl. Wide characters returned from the ODBC driver will be converted to UTF-8 and the perl scalars will have the utf8 flag set (by using sv_utf8_decode). perl scalars which are UTF-8 and are sent through the ODBC API will be converted to UTF-16 and passed to the ODBC wide APIs or signalled as SQL_WCHARs (e.g., in the case of bound columns). I think it might be helpful to have an entry in the DBD::ODBC FAQ like "How Do I Handle Unicode Text With MS Access?" that simply and plainly explains that, mostly, it should Just Work. (Shouldn't it?)	[reply]
Re^7: Mugged by UTF8, this CANNOT be right by mje (Curate) on Jan 27, 2011 at 17:29 UTC
On Windows, it should just work (there are a few exceptions for old ODBC 1 and 2 drivers). On UNIX it is harder as the support for unicode in ODBC drivers and ODBC driver managers differs - which is why it is an optional build setting on UNIX. I'll consider a FAQ entry but perhaps more generally than the one you propose or I'll end up adding loads, one per driver/database.	[reply]