Fast date parsing

jettero has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Fast date parsing by ikegami (Patriarch) on May 02, 2005 at 14:20 UTC
In the following, `$time` is the date in seconds since the UNIX epoch. I'm converting it back to a string (in the print statement) for testing purposes. `use Time::Local; %MONTHS_LOOKUP = ( Jan => 0, Feb => 1, Mar => 2, Apr => 3, #... ); my $date = 'Apr 29 13:54:10'; $date =~ /^(.{3}) (.{2}) (.{2}):(.{2}):(.{2})$/ or die("Bad date\n"); my $time = timelocal($5, $4, $3, $2, $MONTHS_LOOKUP{$1}, 2005); print(scalar(localtime($time)), "\n");` [download] The following variation might be faster, because it doesn't check the validity of the inputs: `use Time::Local qw( timelocal_nocheck ); %MONTHS_LOOKUP = ( Jan => 0, Feb => 1, Mar => 2, Apr => 3, #... ); my $date = 'Apr 29 13:54:10'; $date =~ /^(.{3}) (.{2}) (.{2}):(.{2}):(.{2})$/ or die("Bad date\n"); my $time = timelocal_nocheck($5, $4, $3, $2, $MONTHS_LOOKUP{$1}, 2005); print(scalar(localtime($time)), "\n");` [download]	[reply] [d/l] [select]
Re: Fast date parsing by jettero (Monsignor) on May 02, 2005 at 15:33 UTC
In case anyone's curious as to the actual benchmark results... Time::Local is a 4400% speedup. I was expecting a lot, but that's a lot more than I expected. It looks like regexp based timelocal_nocheck() is 3% faster than the unpack() version -- that could be because of my unnecessary assignment. Neat. I like that Benchmark module. sub datemanip { &UnixDate(&ParseDate($date), '%s') } sub r_timelocal { $date =~ m/^([A-Z][a-z]{2}) (\d{2}) (\d{2}):(\d{2 +}):(\d{2})$/; timelocal($5, $4, $3, $2, $month{$1}, 2005) } sub r_timelocal_nc { $date =~ m/^([A-Z][a-z]{2}) (\d{2}) (\d{2}):(\d{2 +}):(\d{2})$/; timelocal_nocheck($5, $4, $3, $2, $month{$1}, 20 sub u_timelocal_nc { my @a = reverse unpack('A3xA2xA2xA2xA2', $date); +$a[$#a] = $month{$a[$#a]}; timelocal_nocheck(@a, 2005); } Date::Manip: 768 wallclock secs (636.70 usr + 3.83 sys = 640.53 CPU) +@ 156.12/s (n=100000) r timelocal: 20 wallclock secs (14.86 usr + 0.10 sys = 14.96 CPU) @ 6 +684.49/s (n=100000) r timelocal nc: 16 wallclock secs (14.40 usr + 0.06 sys = 14.46 CPU) +@ 6915.63/s (n=100000) u timelocal nc: 20 wallclock secs (14.80 usr + 0.08 sys = 14.88 CPU) +@ 6720.43/s (n=100000) Rate Date::Manip r timelocal u timelocal nc r ti +melocal nc Date::Manip 156/s -- -98% -98% + -98% r timelocal 6684/s 4182% -- -1% + -3% u timelocal nc 6720/s 4205% 1% -- + -3% r timelocal nc 6916/s 4330% 3% 3% + -- [download]	[reply] [d/l]
Re: Fast date parsing by dragonchild (Archbishop) on May 02, 2005 at 14:21 UTC
DateTime is the Perlest. Date::Calc may be faster. But, have you done any profiling? It may be that your algorithms are the problem, not Date::Manip. I'd check out Devel::DProf first ... The Perfect is the Enemy of the Good.	[reply]
Re^2: Fast date parsing by jettero (Monsignor) on May 02, 2005 at 14:25 UTC
An excellent question. I'm not really algorithming. I suppose I should have been more specific. I wish to translate the few million dates into unix seconds... So I'm using $us = &UnixDate( &ParseDate($_), '%s' ); Perhaps that is the slow part?	[reply]
Re^3: Fast date parsing by dragonchild (Archbishop) on May 02, 2005 at 14:33 UTC
Ok ... just out of curiousity - why aren't you batching this up and letting it run over a weekend? The Perfect is the Enemy of the Good.	[reply]
Re: Fast date parsing by benizi (Hermit) on Nov 03, 2005 at 16:29 UTC
Re: Fast date parsing by Limbic~Region (Chancellor) on May 02, 2005 at 14:39 UTC
jettero, Time::Local does some internal caching so subsequent lookups in the same month should be really fast assuming your dates are all being parsed within the same instance of the interpreter. So then it is a matter of speeding things up as fast as possible: `# Keep the lookup table in highest scope necessary to avoid create/des +troy each time needed my %lookup = ( Jan => 0, Feb => 1, ... ); #Assume unpack is faster than regex (Benchmark.pm to be sure) # Add code to handle year appropriately my ($mon, $day, $hr, $min, $sec) = unpack('A3xA2xA2xA2xA2', $date); $mon = $lookup{$mon}; my $stamp = timelocal( ... );` [download] Sorry this is just an outline, I am off to the aiport and had to hurry. Cheers - L~R	[reply] [d/l]
Re: Fast date parsing by eibwen (Friar) on May 02, 2005 at 14:26 UTC
Have you considered using a regex as an alternative to CPAN? Something like: `$text = " Apr 29 12:34:56\n Apr 30 01:23:45\n May 1 06:12:34"; $text =~ /(\w{3})\s+(\d{1,2})\s+(\d{1,2}):(\d{2}):(\d{2})/g;` [download] However, if speed is really an issue, there may be a pack / unpack solution.	[reply] [d/l]
Re: Fast date parsing by ghenry (Vicar) on May 02, 2005 at 14:36 UTC
You can use the Benchmark module to see which way is faster. I don't mean to be disrespectful if you already knew that. Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!	[reply]