Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I am using a program which upload a excel file onto the remote server and from the server that excel file is parsed using Spreadsheet::ParseExcel module and then is stored in the database.
Now the problem is when the excel is parsed the characters like (',-) etc is converted into junk characters and it creates problem while inserting into the database.
Has anybody encountered the same problem and can anybody throw some some light on it.
Thanxs in advance.

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel Problem
by Aristotle (Chancellor) on Dec 26, 2003 at 05:01 UTC

    I have no experience with the module in question, but your inclusion of a single quote in the list of characters makes me suspicious. Are you sure the problem is with Spreadsheet::ParseExcel, or might it actually be caused by interpolating insufficiently escaped strings into SQL statements?

    At any rate the information you gace is insufficient to diagnose a problem from (unless it's a known issue to folks who know the module well). What junk characters do you get? What kind of problems does the database complain about? Can you post a minimal code snippet that exemplifies your trouble?

    Makeshifts last the longest.

      Below is the original string when uploaded

      (Looking at dumps, numerous transactions which abended with dump which were waiting on v1 call (via cornep)
      John to log report with R&D )


      Below is the string when it is parsed.

      (Looking at dumps, numerous transactions which abended with dump which were waiting on v1 call (via cornep)  John to log report with R&D)

      Note the character - (in red font) is converted to . And as far as "'" is concerned is taken care at the time of insert into database.
        That character in red looks like an en-dash ("–") to me. There is no en-dash character in ISO-Latin-1 — but only in Microsoft's extended character set, Code Page 1252, at position 0x96. In Unicode, its character code is 0x2013.

        I'm not sure exactly what is happening here, but I'm quite convinced this is the cause of your problem. Check what character set the Excel file is in, I would think it's in 16-bit Unicode. Perhaps Spreadsheet::ParseExcel simply drops the upper byte, leaving you with the control character 0x13, chr(19).

        Copying the block character out of the HTML page and pasting it into my text editor shows that it is indeed a chr(19). So that explains what is going on, but that still doesn't fix it. It looks like a lot of work; likely you should somehow try to convert the character set from Unicode to CP-1252/Windows.

        So it seems to genuinely be a problem with Spreadsheet::ParseExcel, which wasn't quite clear from your initial post. In that case I can't help you. :)

        Makeshifts last the longest.