jettero has asked for the wisdom of the Perl Monks concerning the following question:

I want to parse dates like 'Apr 29 13:54:10'. I'm accustomed to using Date::Manip for basically everything, but in this case, I need to parse a few million dates... I hate to say it, but you can really feel the slowness of the pure perl Date::Manip. Is there some faster way to parse these fairly standard date strings? I'm looking at Time::Local qw(timelocal), but I'll need to parse the month by hand.

What is the most CPAN styleish way of parsing dates quickly these days?

EDIT: "parse dates" means:

I'm interested in turning 'Apr 29 13:54:10' into unix seconds: 1114797250.

Replies are listed 'Best First'.
Re: Fast date parsing
by ikegami (Patriarch) on May 02, 2005 at 14:20 UTC

    In the following, $time is the date in seconds since the UNIX epoch. I'm converting it back to a string (in the print statement) for testing purposes.

    use Time::Local; %MONTHS_LOOKUP = ( Jan => 0, Feb => 1, Mar => 2, Apr => 3, #... ); my $date = 'Apr 29 13:54:10'; $date =~ /^(.{3}) (.{2}) (.{2}):(.{2}):(.{2})$/ or die("Bad date\n"); my $time = timelocal($5, $4, $3, $2, $MONTHS_LOOKUP{$1}, 2005); print(scalar(localtime($time)), "\n");

    The following variation might be faster, because it doesn't check the validity of the inputs:

    use Time::Local qw( timelocal_nocheck ); %MONTHS_LOOKUP = ( Jan => 0, Feb => 1, Mar => 2, Apr => 3, #... ); my $date = 'Apr 29 13:54:10'; $date =~ /^(.{3}) (.{2}) (.{2}):(.{2}):(.{2})$/ or die("Bad date\n"); my $time = timelocal_nocheck($5, $4, $3, $2, $MONTHS_LOOKUP{$1}, 2005); print(scalar(localtime($time)), "\n");
Re: Fast date parsing
by jettero (Monsignor) on May 02, 2005 at 15:33 UTC
    In case anyone's curious as to the actual benchmark results...

    Time::Local is a 4400% speedup. I was expecting a lot, but that's a lot more than I expected.

    It looks like regexp based timelocal_nocheck() is 3% faster than the unpack() version -- that could be because of my unnecessary assignment. Neat. I like that Benchmark module.

    sub datemanip { &UnixDate(&ParseDate($date), '%s') } sub r_timelocal { $date =~ m/^([A-Z][a-z]{2}) (\d{2}) (\d{2}):(\d{2 +}):(\d{2})$/; timelocal($5, $4, $3, $2, $month{$1}, 2005) } sub r_timelocal_nc { $date =~ m/^([A-Z][a-z]{2}) (\d{2}) (\d{2}):(\d{2 +}):(\d{2})$/; timelocal_nocheck($5, $4, $3, $2, $month{$1}, 20 sub u_timelocal_nc { my @a = reverse unpack('A3xA2xA2xA2xA2', $date); +$a[$#a] = $month{$a[$#a]}; timelocal_nocheck(@a, 2005); } Date::Manip: 768 wallclock secs (636.70 usr + 3.83 sys = 640.53 CPU) +@ 156.12/s (n=100000) r timelocal: 20 wallclock secs (14.86 usr + 0.10 sys = 14.96 CPU) @ 6 +684.49/s (n=100000) r timelocal nc: 16 wallclock secs (14.40 usr + 0.06 sys = 14.46 CPU) +@ 6915.63/s (n=100000) u timelocal nc: 20 wallclock secs (14.80 usr + 0.08 sys = 14.88 CPU) +@ 6720.43/s (n=100000) Rate Date::Manip r timelocal u timelocal nc r ti +melocal nc Date::Manip 156/s -- -98% -98% + -98% r timelocal 6684/s 4182% -- -1% + -3% u timelocal nc 6720/s 4205% 1% -- + -3% r timelocal nc 6916/s 4330% 3% 3% + --
Re: Fast date parsing
by dragonchild (Archbishop) on May 02, 2005 at 14:21 UTC
    DateTime is the Perlest. Date::Calc may be faster.

    But, have you done any profiling? It may be that your algorithms are the problem, not Date::Manip. I'd check out Devel::DProf first ...


    The Perfect is the Enemy of the Good.

      An excellent question. I'm not really algorithming.

      I suppose I should have been more specific. I wish to translate the few million dates into unix seconds... So I'm using $us = &UnixDate( &ParseDate($_), '%s' );

      Perhaps that is the slow part?

        Ok ... just out of curiousity - why aren't you batching this up and letting it run over a weekend?

        The Perfect is the Enemy of the Good.

Re: Fast date parsing
by Limbic~Region (Chancellor) on May 02, 2005 at 14:39 UTC
    jettero,
    Time::Local does some internal caching so subsequent lookups in the same month should be really fast assuming your dates are all being parsed within the same instance of the interpreter. So then it is a matter of speeding things up as fast as possible:
    # Keep the lookup table in highest scope necessary to avoid create/des +troy each time needed my %lookup = ( Jan => 0, Feb => 1, ... ); #Assume unpack is faster than regex (Benchmark.pm to be sure) # Add code to handle year appropriately my ($mon, $day, $hr, $min, $sec) = unpack('A3xA2xA2xA2xA2', $date); $mon = $lookup{$mon}; my $stamp = timelocal( ... );
    Sorry this is just an outline, I am off to the aiport and had to hurry.

    Cheers - L~R

Re: Fast date parsing
by eibwen (Friar) on May 02, 2005 at 14:26 UTC

    Have you considered using a regex as an alternative to CPAN? Something like:

    $text = " Apr 29 12:34:56\n Apr 30 01:23:45\n May 1 06:12:34"; $text =~ /(\w{3})\s+(\d{1,2})\s+(\d{1,2}):(\d{2}):(\d{2})/g;

    However, if speed is really an issue, there may be a pack / unpack solution.

Re: Fast date parsing
by ghenry (Vicar) on May 02, 2005 at 14:36 UTC

    You can use the Benchmark module to see which way is faster.

    I don't mean to be disrespectful if you already knew that.

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!