Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
Does anyone know a robust date string parsing module that fits the following criterion:

As far as I can tell, Date::Manip seems to be the only contender fitting items 1-3. Unfortunately it fails on items 4 and 5. The sparse documentation doesn't exhaustively list all the formats it understands nor what it would do in the circumstance of '02/01/03'. Instead, it says things like:

In the documentation below, US formats are used, but in most (if not all) cases, a non-English equivalent will work equally well.

As well as

Parsing a date from any convenient format

Here is the approach I was going to take:

The thing is - it seems silly to use a module if I am having to still do corner cases by hand.

Any insight into other modules would be appreciated.

Cheers - L~R

Replies are listed 'Best First'.
Re: Date String Parsing
by Sidhekin (Priest) on Nov 27, 2007 at 16:57 UTC

    Time::ParseDate at least is a date string parsing module (duh!), and it passes your items 1-3. Oh, and it was good enough for RT. :)

    Whether you would consider it "robust" or "well documented" I cannot say. Item 5 is technically a pass, but if this ability is enough for your needs, I again wouldn't know. Still, as it is pure Perl, I suspect you'll find a way. :)

    Might be worth looking into, if you haven't already.

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      Sidhekin,
      Thank you! This does seem to fit all my criterion on the surface. I am not sure how well it will work in practice but it gives me something to work from. I am not sure how I missed this when I was looking at existing wheels.

      Cheers - L~R

Re: Date String Parsing
by jdporter (Paladin) on Nov 27, 2007 at 16:30 UTC
    it seems silly to use a module if I am having to still do corner cases by hand

    IMHO, in this case, it doesn't seem so silly. Date parsing is a big ugly problem, and Date::Manip does remarkably well, considering. And if you could turn your corner-case fixes into patches to the module, that would be even better.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
      jdporter,
      I agree with you completely in principle. I don't have time to decipher the Date::Manip code in order to write patches with working tests. This is primarily due to the fact that I am going to be a Dad again real soon now :-)

      In addition to not having time to fully document the parsing routine and write patches for more formats, I certainly don't have time to create new features. One of the failures of Date::Manip is to control the behavior when more than one outcome is possible.

      Cheers - L~R

Re: Date String Parsing (TFM)
by tye (Sage) on Nov 27, 2007 at 20:24 UTC

    Did you not find the documentation for Date::Manip or did you just not read it?

    DateFormat
    Different countries look at the date 12/10 as Dec 10 or Oct 12. In the United States, the first is most common, but this certainly doesn't hold true for other countries. Setting DateFormat to "US" forces the first behavior (Dec 10). Setting DateFormat to anything else forces the second behavior (Oct 12).

    If you wanted a listing of all of the different formats that it supports, then you misunderstand how it works. It supports more formats than could be simply listed. If you have FUD, then test it. Or go to the secondary documentation source, the source code (I recommend both):

    if (/^$D\s+$D(?:\s+$YY)?$/) { # MM DD YY (DD MM YY non-US) ($m,$d,$y)=($1,$2,$3); ($m,$d)=($d,$m) if ($type ne "US"); last PARSE;

    So 01/02/03 defaults to MM/DD/YY but can be set to be DD/MM/YY. It doesn't support YY/MM/DD for that case, just YYYY/MM/DD, as is reasonable, IMHO.

    - tye        

      tye,
      Did you not find the documentation for Date::Manip or did you just not read it?

      Since I quoted from it in my post, let's assume I found it and read parts of it. I have, at one point or another, read all of the documentation. I am guilty of not spending a lot of time with the source code.

      It seems like my real crime is laziness. I am not interested in seeing a list of all the formats it supports. I am interested in a list of ambigous formats and settings to modify to specify which one to use. I want to be able to look at a table and that says if your string looks like this the module will see at as X. If you want Y then change this setting to Z. This is apparently a pipe dream.

      Cheers - L~R

Re: Date String Parsing
by graff (Chancellor) on Nov 27, 2007 at 19:54 UTC
    You said:
    The sparse documentation doesn't ... [explain] what it would do in the circumstance of '02/01/03'.
    In the absence of more information about your intended usage, I wouldn't know what sort of advice to give. Are you processing text that has been collected "from the wild" and trying to identify/normalize all date references? (Very hard target with no complete solution -- some errors are inevitable.) Are you soliciting date strings from users via some sort of form submission and hoping to normalize their inputs? (Easily solved with appropriate design of the form.) Something else?

    If a string like "02/03/04" is coming from who-knows-where, it is intrinsically ambiguous -- one can only hope that there is enough contextual information in the data surrounding it to allow a human to interpret it correctly (it might not even be a date). Asking for a perl script to get it right might be expecting too much.

      graff,
      You are assuming that the code will not know how to handle that string and will fail. I am assuming it will use one of YY/MM/DD, YY/DD/MM, MM/YY/DD, MM/DD/YY, DD/MM/YY, DD/YY/MM and work. What I am asking for is that the documentation indicate which one, if any, is used. Additionally, in the cases of ambiguity, I want to be able to control which one is used.

      With regards to the data. The values are in a field designated as a date type so in fact I can be sure they are supposed to be dates. The automation need not be able to parse every string (throwing errors is fine) but when it does parse it - it should NEVER get it wrong.

      Cheers - L~R

Re: Date String Parsing
by perlfan (Parson) on Nov 28, 2007 at 04:28 UTC
    When you mentioned pure Perl, I thought of Date::Pcalc. I am not sure how well it suites some of the other needs, but I've used it before to get date diffs, etc.
Re: Date String Parsing
by dragonchild (Archbishop) on Nov 27, 2007 at 16:34 UTC
    Your requirement of "no laundry list" is a ridiculous requirement and should be dropped, thus allowing DateTime as the proper solution.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      DateTime itself has no sensible parser. There is DateTime::Format::strptime, but after looking at its code, I'm not sure anymore that that's a sensible solution :-)

      dragonchild,
      You may think it is a ridiculous requirement but I disagree. I am not going to defend the position but instead set it aside to see how well DateTime fits the other criterion. Since DateTime doesn't even have a parsing method that I can see, let's assume you meant DateTime::Format::DateParse which isn't part of the DateTime bundle.
      • Written in pure perl - pass
      • Is not part of a bundle of other un-wanted modules - pass since the bundle is quite small
      • Well documented - fail
      • Provides the ability to control behavior when multiple dates are possible - fail from my perspective (strptime)

      So it fails 2 of the remaining requirements, the same as Date::Manip, but also doesn't satisfy my ridiculous requirement.

      Cheers - L~R

      Swapped "dependencies" with "bundle" requirement