imlou has asked for the wisdom of the Perl Monks concerning the following question:

More problems with perl. I know this is kind of basic pattern matching but I'm having troubles. I'm trying to match all date formats (ie. dddd-dd-dd, dd/dd/dddd, dd-dd-dddd, Jan|January 25 2000 where d=digit) and replace it with the following <date>dd-dd-dddd</date>.
my @dates = map {s/(\d{2, 4}[\W]\d{2}[\W]\d{2.4})/<date>$1</date>/g; s +plit;} <FILE>;
Correct?...not correct?

Replies are listed 'Best First'.
Re: matching different date formats
by Callum (Chaplain) on Nov 08, 2002 at 23:15 UTC
    Check out the Date::Manip module -- this will probably do a lot of what you're after.

    If you've got an international audience beware of dd/mm/yy versus mm/dd/yy formats which may make things more interesting.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: matching different date formats
by sauoq (Abbot) on Nov 08, 2002 at 23:31 UTC

    No, that's not correct.

    Besides the typos and the fact that you probably don't really want to eliminate whitespace or put non-date elements into your @dates array, your regex has some problems.

    Working from the example you give, this might be a little closer to what you want:

    my @dates; while (<FILE>) { while (m!(\d{2,4}[-/]\d{2}[-/]\d{2,4}|\w+\s+\d+\s+\d+)!g) { push @dates, "<date>$1</date>" } }

    That could match a whole bunch of other stuff too though... If you can better explain what you need someone will almost surely give you something better than that.

    -sauoq
    "My two cents aren't worth a dime.";
    
      Better explanation:
      For example, if I have a file with the following:
      2000-04-23 was a great day, but 04-24-1999 is much better.
      The output should produce:
      <date>2000-04-23</date> was a great day, but <date>04-24-1999</date> is much better. Since the dates could be something like 2000/04/23 I would like to match those 3 formats for now. Is this any better? Thanks for your help.
        while ( <DATA> ) { s!\b((\d{4}[-/]\d{2}[-/]\d{2}\b)|(\d{2}[-/]\d{2}[-/]\d{4}))\b!<date +>$1</date>!g; print; } __DATA__ 2000-04-23 was a great day, but 04-24-1999 is much better. 2000/04/23 I would like to match those 3 formats for now. 2000-85-85 is not a real date.

        The above regex does what you ask, but as noted by fellow monks, if you do not use a module, you will need to add other things to make sure what it is that you are matching is trully a date. As the above regex will accept 2000-73-08 which is not a valid date. One could write a regex to eliminate this possiblity, but as FamousLongAgo mentions there are vast subtleties as to what might be a valid date. (leap years, as one example,present a special problem). But then again only you know the data you are trying to match against.

        As previously mentioned by saouq, if you give a better explanation of why don't want to use a module and what else you might need, someone will lead you in the right direction.