Re: Arabic Chars in Excel Files

I think you'll want to try Spreadsheet::ParseExcel instead of DBI. When you drill down to individual cell contents, you'll be able to check whether the cell value has its "Code" attribute set to "ucs2", and in that case, use the "decode()" function from Encode to convert from UTF16LE to utf8. (M$ Excel alternates between ucs2 and "native" encodings on a cell by cell basis.)

Interestingly, when I run the "dmpExR.pl" sample script that comes with that module, it seems to automagically convert the characters to utf8 on my macosx.

Or, if you prefer to stick with DBI, just do something like this for each cell value:

use Encode;
...
[download]

~~if ( $cellValue =~ /(?:[\x06].)+/ ) {~~

    if ( $cellValue =~ /(?:.\x06)+/ ) {
        $cellValue = decode( "UTF-16LE", $cellValue );
        # now it's utf8
    }
...
[download]

The "\x06" would work if the ucs2 content is Arabic, because all Arabic characters are in the range U+0600 - U+06FF.

UPDATE: Sorry -- I just noticed that you are still using perl 5.6; you really seriously should consider upgrading (5.8.8 is current at the moment). Working with Arabic or other Unicode stuff in 5.6 strikes me as a bad idea. BTW, regarding those letters you quoted in your sample data: there probably are "\x06" bytes next to them, as well as pairs of "\x06" followed by "some other non-displayable byte value".

Comment on Re: Arabic Chars in Excel Files Select or Download Code

Replies are listed 'Best First'.
Re^2: Arabic Chars in Excel Files by jZed (Prior) on Aug 17, 2007 at 02:10 UTC
"Using DBI" is a vague description. It could mean using DBD::ODBC or it could mean using DBD::Excel. The latter uses Spreadsheet::ParseExcel so might moot your suggestion to add stuff to the current processing. I'm not sure of the consequences of either one on the issue at hand but second your suggestion to get a more modern perl.	[reply]