lihao has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl module Spreadsheet::DataFromExcel to read Excel files. It sounds to be a very simple interface to handle Excel file. The big problem I have is the date field like '02/09/2010' is transfered into a number '40218'. I am not sure how this number is calculated? Is there any work around for this issue? or any formula from '02/09/2010' to '40218'?

Another issue is: the contents in the parsed array of array have '\0' inserted between every two characters. so I have to remove them before using these data

Many thanks

lihao

Update:First, Many thanks to both who suggested, I've tested ww's suggestion with MySQL:

to_days('2010-02-09') - to_days('1900-01-01')

which returns 40216 (very close to the value 40218 I got)? Does anyone know of where the small difference comes from?? Many thanks again.

BTW. I am using the RHEL5 as the OS to run my Perl script.

Update: After google with Excel 1899-12-30 issue, I guess, ww's comment is right, this is a known issue with Excel. So my problem is basically solved. :-)

Replies are listed 'Best First'.
Re: Question with Spreadsheet::DataFromExcel
by ww (Archbishop) on Feb 17, 2010 at 03:42 UTC

    Excel stores dates (and times) as days and fractional days -- the Windows default uses 19000101 00:01 (or maybe 00:00) as the start day-time. On MAC it's 19040101. At least one other spreadsheet uses 18991230 (Correct: Dec 30 -- why? I have no idea).

    The value you report looks like a reasonable conversion from '02/09/2010' ("reasonable" or "close to" - didn't check for exactitude, but 110yrs *365 days/year = 41,150).

    Depending on the precision of your needs you could simply divide the number you're getting by ( 365.25 * years ) -- but will be very rough. For greater precision, you can use a date module to do the conversion reliably.

Re: Question with Spreadsheet::DataFromExcel
by Anonymous Monk on Feb 17, 2010 at 02:12 UTC
    I am not sure how this number is calculated?

    That sounds like the raw value, since dates aren't actually stored the way they are displayed (they're stored as a number, not a string).

    Another issue is: the contents in the parsed array of array have '\0' inserted between every two characters.

    That sounds like UCS-2 or some such unicode encoding , not like a bug

    I suggest you try Spreadsheet::ParseExcel