I think you'll want to try
Spreadsheet::ParseExcel instead of DBI. When you drill down to individual cell contents, you'll be able to check whether the cell value has its "Code" attribute set to "ucs2", and in that case, use the "decode()" function from
Encode to convert from UTF16LE to utf8. (M$ Excel alternates between ucs2 and "native" encodings on a cell by cell basis.)
Interestingly, when I run the "dmpExR.pl" sample script that comes with that module, it seems to automagically convert the characters to utf8 on my macosx.
Or, if you prefer to stick with DBI, just do something like this for each cell value:
use Encode;
...
if ( $cellValue =~ /(?:[\x06].)+/ ) {
if ( $cellValue =~ /(?:.\x06)+/ ) {
$cellValue = decode( "UTF-16LE", $cellValue );
# now it's utf8
}
...
The "\x06" would work if the ucs2 content is Arabic, because all Arabic characters are in the range U+0600 - U+06FF.
UPDATE: Sorry -- I just noticed that you are still using perl 5.6; you really seriously should consider upgrading (5.8.8 is current at the moment). Working with Arabic or other Unicode stuff in 5.6 strikes me as a bad idea. BTW, regarding those letters you quoted in your sample data: there probably are "\x06" bytes next to them, as well as pairs of "\x06" followed by "some other non-displayable byte value".
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.