http://qs1969.pair.com?node_id=719893

finhagen has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to parse Excel spreadsheets and insert the data into mysql. I am actually having quite a bit of success, but one issue I have not been able to address is ingesting dates. The unformatted date in Excel appears to be in the Julian format* (e.g. 10/28/2008 is 39749). There are three problems with that:
1) mysql expects dates formatted as 2008-10-28. I wrote a simple script to convert a standard US date format to mysql format:
#!/usr/bin/perl use strict; use warnings; my $sdate0='10/28/2008'; my @sdate = split /\//, $sdate0; my $sdate1 = join '-', $sdate[2],$sdate[0],$sdate[1]; print "$sdate1\n";
2) when I parse the date in the spreadsheet it actually is ingested as 39749 (sigh) so my split join script is of no value.
3) I did some research and it doesn't appear that mysql can convert Julian dates to it's date standard (2x sigh). Therefore, it looks like I need a perl script that can convert a Julian date to a mysql date. I looked on cpan and while there are some modules which address Julian dates, I didn't see anything that would do what I am trying to do. Does anyone have a Julian to mysql date conversion script?

*I typically receive the source spreadsheet via email in Office 97 xls format. I run Windows in a VM on top of a SUSE linux host os. When I save the xls file to my Unix (ext3) filesystem the date seems to distill down to the Julian format. I am running mysql and perl on the linux host and that appears to result in this Julian format issue.

Hagen Finley
Boulder, CO

Replies are listed 'Best First'.
Re: Converting Julian Dates to Text
by ikegami (Patriarch) on Oct 28, 2008 at 04:04 UTC
    A CPAN search finds matches, including DateTime::Format::Epoch::MJD.
    use DateTime::Format::Epoch::MJD qw( ); print( DateTime::Format::Epoch::MJD->parse_datetime( $mjd )->ymd(), "\n" );

    But contrary to what you said 39749 is 1967-09-16. 2008-10-28 is 54767.

    Update:

    You can make your own DateTime::Format::Epoch derivative.

    BEGIN { package DateTime::Format::Epoch::Excel; use DateTime qw( ); use DateTime::Format::Epoch qw( ); our @ISA = 'DateTime::Format::Epoch'; my $epoch = DateTime ->new( year => 1900, month => 1, day => 1 ) ->subtract( days => 2 ); # Incorrect for days before March 1st, 1900 # due to 1900 being treated as a leap year. sub new { my ($class) = @_; return $class->SUPER::new( epoch => $epoch, unit => 1/86400, type => 'float', skip_leap_seconds => 1 ); } } print( DateTime::Format::Epoch::Excel->parse_datetime( 39749 )->ymd(), "\n" );

    You can use DateTime::Format::Excel.

    use DateTime::Format::Excel qw( ); print( DateTime::Format::Excel->parse_datetime( 39749 )->ymd(), "\n" );
Re: Converting Julian Dates to Text
by almut (Canon) on Oct 28, 2008 at 03:54 UTC

    IIRC, Excel's dates are offset at 1900. So, the following should get you close to a solution:

    use Time::JulianDay; my $jd = 39749 + julian_day(1900, 1, 0); my ($year, $month, $day) = inverse_julian_day($jd); print "$year-$month-$day\n"; # 2008-10-29

    (someone else will probably be able to explain why this produces Oct 29, not Oct 28... :)

Re: Converting Julian Dates to Text
by CountZero (Bishop) on Oct 28, 2008 at 06:43 UTC
    From the Wikipedia:
    The Julian date (JD) is the interval of time in days and fractions of a day, since 4713 BC January 1, Greenwich noon, Julian proleptic calendar.

    Excel dates have unfortunately absolutely nothing to do with the Julian Date system.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      In fact Julian dates do have day/month/year and were used up until the 16th century in much of Europe (until the 18th century in the UK and US and until the 20th in Russia and other places). The interval in days since 4713BC is used by astronomers and is usually called the Julian day. What the questioner really wants is the Gregorian date (i.e. what most people have been using since 1900).

      The distinction is only really important for historians, for example worrying about the fact that Russia's "October Revolution" took place in what most of the world thought of as November.

      As other posters have stated there are many CPAN modules for converting from Julian dates (or days) to and from Gregorian, Islamic, Jewish and all sorts of other dates (even Excel ones).

Re: Converting Julian Dates to Text
by Corion (Patriarch) on Oct 28, 2008 at 18:11 UTC
Re: Converting Julian Dates to Text
by jmcnamara (Monsignor) on Nov 03, 2008 at 12:41 UTC
    The unformatted date in Excel appears to be in the Julian format.

    Dates in Excel are not stored in Julian format. They are stored as the number of days from a 1900 epoch with 1900 treated as a leap year.

    Both WritExcel and ParseExcel provide utility functions for dealing with dates in this format but I'd recommend the DateTime::Format::Excel module.

    --
    John.