cajun has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script for a friend who wants to parse a newsgroup message file and develop a report based on messages that were sent between two dates. These dates could be as short as a week, or as large as 3 months (perhaps larger than that).

I've been to CPAN: Date:: and found several modules that would probably help me in doing this. However, I've seen some references on PM concerning a couple of these modules. Their comments usually contained "the module is large" verbage or similar. Now to me, large = slow. Perhaps though that is not what they were referring to though.

So my question is how to compare the dates in the message file to the dates requested ? What is the most efficient way to accomplish this ?

Replies are listed 'Best First'.
Re: Date comparisons + Benchmark
by OeufMayo (Curate) on Feb 09, 2001 at 04:12 UTC

    Though Date::Manip is a really powerful tool, as his author states in the documentation, it's nearly always an overkill to use it in simple tasks.

    You should probably want to benchmark Date::Manip with other modules like Date::Calc with the Delta_days() function, which might be what you're looking for.

    Update:

    I just benchmarked both module and Manip is really slower than Calc (but remember that Calc is not 100% Perl, and Date::Manip parses each time the dates we provide)

    Here's the code and the results of the Benchmark, if anyone wants to optimize, criticize, or bencmark with other module:

    $d1 = '14', $m1 = '08', $y1 = '2001'; $d2 = '16', $m2 = '12', $y2 = '2001'; $dt1 = "2001/08/14"; $dt2 = "2001/12/16"; timethese(-10, { # running for at least 10 seconds 'Calc' => \&Calc, 'Manip' => \&Manip, }); sub Calc { my $Dd = Date::Calc::Delta_Days($y1,$m1,$d1,$y2,$m2,$d2); } sub Manip { my $date1=Date::Manip::ParseDate($dt1); my $date2=Date::Manip::ParseDate($dt2); my $flag=Date::Manip::Date_Cmp($date1,$date2); }

    The results:

    Benchmark: running Calc, Manip, each for at least 10 CPU seconds... Calc: 12 wallclock secs (11.04 usr + 0.00 sys = 11.04 CPU) @ 142935.85/s (n=1577440) Manip:10 wallclock secs (10.56 usr + 0.00 sys = 10.56 CPU) @ 104.88/s (n=1107)

    Is this making Date::Calc 1300 times faster than Date::Manip?

    -- Briac

    <kbd>--
    PerlMonger::Paris(http => 'paris.pm.org');</kbd>

      Yes, the numbers say that your calculations with Date::Calc run 1300 times faster than your calculations with Date::Manip. You weren't "comparing apples to apples" (the Date::Calc loop doesn't do any "parsing" at all), however, so that speed difference is going to be, at least a bit, overstated.

      But much more important than that, is that the numbers say that reparsing two dates and then comparing them can be done in 1/100th of a second with Date::Manip (on your computer with those simple date formats). Because even being 1300-times faster doesn't mean that you'll even notice the difference. (:

      So, for example, if we wanted to compute a date difference for each news article as we display it, well, 1/100th of a second vs. 1e-5 seconds isn't going to make a noticeable difference at all. If we want to compute a date difference for thousands of news articles in order to select a dozen articles to list, then this will probably make a quite noticeable difference.

      But your benchmarks don't apply at all well to the original problem. If the news articles came with month, day, and year already parse out as numbers, then I probably would just use Time::Local since it just exposes an ANSI-standard-C function.

      Given the task of parsing dates in news articles, I'd start with Date::Manip, since that is exactly the kind of problem it was designed for (since Usenet news articles usually have dates encoded in different formats and from different timezones and Date::Manip knows how to take that into account) and using it would require very little coding on my part. If the results seemed slow to me, then I'd look at how much work it would take to use something else and then decide whether it was worth my time to produce enough working code that I could compare the subjective speed difference for my specific task.

              - tye (but my friends call me "Tye")

        A lot of the inefficiency in Date::Manip (in re-parsing) could be eliminated by caching the results in seconds; e.g. using  Date_SecsSince1970GMT to create a hash mapping message ID's to dates in seconds. This lets Date::Manip do the "magic" of normalizing various date formats and TZ's into an "universal" format, but the actual sort & search would now be against a list of (large) integers.

Re: Date comparisons
by mikfire (Deacon) on Feb 09, 2001 at 07:56 UTC
    I don't think anybody has mentioned this, but have you thought about using Time::Local? It is a core module ( bonus for portability ) and, to quote the manpage:
      These routines are the inverse of built-in perl fuctions
      localtime() and gmtime().  They accept a date as a six-
      element array, and return the corresponding time(2) value
      in seconds since the Epoch (Midnight, January 1, 1970).
      This value can be positive or negative.
    

    I haven't done any bench marking, but I use these frequently to translate a human readable date ( like those found in syslogs and the like ) into seconds since the epoch. That makes it very easy to compare two dates, to calculate the delta between two dates, etc., without having to worry about month boundaries, leap years and other items that I almost always get wrong anyway.

    If I have read what you are trying to do correctly, I think this module will do what you want.

    mikfire

Another module-free alternative
by hotyopa (Scribe) on Feb 09, 2001 at 05:26 UTC
    If you didn't want to use a module, you could use split to separate both dates into month (mm) day (dd) and year(yyyy), then reconstruct them as a number in the form 'yyyymmdd'. Then a simple numeric comparison would do the trick:
    #assumes format dd/mm/yyyy (Australian format) #and vars $start and $end already in yyyymmdd format ($day, $month, $year) = split(/\//, $newsdate); $newsdate = "$year$month$day"; if ($newsdate >= $start && $newsdate <= $end) { #do stuff }

    *~-}hotyopa{-~*

Re: Date comparisons
by myocom (Deacon) on Feb 09, 2001 at 03:59 UTC

    Having used Date::Manip, that would be my first instinct. But, as you've read, it doesn't benchmark particularly fast (at least partially because it's written purely in Perl).

    For my needs, though, (and yours too, I would guess), unless you're running loops of date calculations, the ease of use of said module will outweigh the speed hit you may take while using it.