kulls has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
Followed the thread Regexp for extracting date, i need another help.I'm creating a file, which is a search results of many files.Each file have their own 'date' patterns. Now i got the output like,
Sep 29 2005 Jun 30, 2003 December 15 2005 December 31, 2004 06-Dec-2005 10-19-2005
Indeed, i need int_month/int_year (10/2005).Can anyone suggest me to get the unique pattern from this ?
Update : As salva and tirwan suggested for external modules, which i have to intall separetely. I'll be grateful, if any other options without go for external modules.
Thanx

-kulls

Replies are listed 'Best First'.
Re: Multiple date format
by salva (Canon) on Dec 20, 2005 at 11:16 UTC
    try with Date::Manip, its ParseDate function will be able to parse all those formats.
Re: Multiple date format
by GrandFather (Saint) on Dec 20, 2005 at 11:51 UTC

    The following works for the range of formats you have given.

    use strict; use warnings; my %months = ( january => 'jan', february => 'feb', march => 'mar', april => 'apr', may => 'may', june => 'jun', july => 'jul', august => 'aug', september =>'sep', october => 'oct', november => 'nov', december => 'dec', ); my %monthCvt = ( jan => 1, feb => 2, mar => 3, apr => 4, may => 5, jun => 6, jul => 7, aug => 8, sep => 9, oct => 10, nov => 11, dec => 12, ); while (<DATA>) { chomp; $_ = lc $_; # Strip leading spaces s/\s*//; # Normalize month s/([a-z]+)/$months{$1} || $1/e; # Normalize punctuation s|[ -/,.\\]+| |g; # Normalize order: month day year s|^(\d+)\s([a-z]+)|$2 $1|; # Extract fields my ($month, $day, $year) = /(\w+) (\d+) (\d+)/; # Month to numeric form if required $month = $monthCvt{$month} if $month !~ /\d/; print "$month/$day/$year\n"; } __DATA__ Sep 29 2005 Jun 30, 2003 December 15 2005 December 31, 2004 06-Dec-2005 10-19-2005

    Prints:

    9/29/2005 6/30/2003 12/15/2005 12/31/2004 12/06/2005 10/19/2005

    DWIM is Perl's answer to Gödel
Re: Multiple date format
by tirwhan (Abbot) on Dec 20, 2005 at 11:24 UTC

    Further to salva's excellent answer, there is also Date::Calc which is less resource-intensive than Date::Manip and almost as flexible.

    Also, just as a note, if you have formats that alternately specify dd-mm-yyyy and mm-dd-yyyy in all numbers you won't be able to reliably parse those dates because there is no way for the parser to tell them apart in many cases (your example doesn't give that, but it does give mm-dd-yyy and dd-month-yyyy, so I thought I'd flag this).


    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re: Multiple date format
by McDarren (Abbot) on Dec 20, 2005 at 12:16 UTC
    First of all, let me say that I certainly don't recommend that you do it this way :)

    And yes, I did see your note about not wanting to use external modules....

    ...but, for the sake of the exercise, and just to prove that it can be done - here it is extending upon my earlier reply using Regexp::Common::time :)

    #!/usr/bin/perl -w use strict; use Regexp::Common qw(time); while (<DATA>) { print if ($_ =~ $RE{time}{strftime}{-pat => '((%B|%b) %d|%d-%b|%m-%d)[\s\-,]+%Y'}); } __DATA__ Sep 29 2005 some stuff Jun 30, 2003 some more stuff December 15 2005 not a date December 31, 2004 06-Dec-2005 10-19-2005

    Which gives:

    Sep 29 2005 Jun 30, 2003 December 15 2005 December 31, 2004 06-Dec-2005 10-19-2005

    Cheers,
    Darren :)

Re: Multiple date format
by pKai (Priest) on Dec 21, 2005 at 12:01 UTC
    I would suggest using Graham Barr's time-tested and lean Date::Parse, somewhat like this:
    use Date::Parse qw(strptime); for (<DATA>) { my @date = (strptime($_))[4,3,5]; ++$date[0]; $date[2] += 1900; print join('/' => @date) . $/; }

    which parses/outputs correct for your sample DATA above.
    And if you insist on not using additional modules. I'd rather steel from its code ...