This is one of those puzzles that I thought would be a breeze to tackle.

What I have is lots and lots of files that are filled with dates (along with other data). Getting the dates out of the files is surprisingly easy. They're always in the same relative places in all of the files.

Translating/converting these dates into something that's a bit more standard is the tough part. Initially, the Regex started out very small. Something like this:

my $date = "5/12/1998"; $date =~ m|(\d*)/(\d*)/(\d*)|;
Then I encountered some files where spaces were added in between the digits. (I assume to keep the single digit days/months lined up with double digit days/months.)
my $date = " 5/ 2/1998"; $date =~ m|\s?(\d*)\s?/\s?(\d*)\s?/\s?(\d*)|;
Of course, you can guess what else I encountered. Dates where the month is named instead of a numeric, such as Jan/1/1998. Short dates without the slashes such as Jan 1, 1998. Long dates such as January 1, 1998. Two digit year dates such as 1/1/88.

Pretty soon, my Regex started looking really ugly. It got to a point, where I'm spending more time adding new rules to the Regex rather than focusing on finishing the rest of the code to parse the other data.

The only major aberrant date format is when they're missing the actual day. Such as; February 1988. As far as I can tell, all of them follow the U.S. conventional order of Month, Day then Year.

So I come to the Monks. After stumbling over yet another rule change to the Regex, I realized that this can't be such a unique problem. Chances are some person or persons encountered the exact same issues and created a workable Regex/module that I can utilize to read and translate these dates into something more standardized. Can someone please help direct me to this Regex/Module, if it exists?

----
Thanks for your patience.
Prove your knowledge @ HLPD


In reply to Parsing oddball dates by SavannahLion

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.