jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks
I've the following problem with dates. The following dates are possible
2005-100 10:12:01 2005100 10:12:01 2005-100 2005100 2005
This is what I would like todo:
($year, $julian, $time) = ($my_date =~ /\d{4}-?(\d{3})\s{0,}?(\d\d:\d\d:\d\d)?/
But this doesn't work. If one of the fields is missing nothing is extracted...
Any suggestions what the correct regexp should look like ?

Thanks a lot in advance
Luca

Replies are listed 'Best First'.
Re: regexp problem
by brian_d_foy (Abbot) on Feb 11, 2006 at 17:09 UTC

    You just have to make parts of the regex optional, and ensure you capture the right parts.To get the captured values as return values, you need to use the /g flag.

    #!/usr/bin/perl while( <DATA> ) { chomp; my( $year, $julian, $hour, $minute, $second ) = m/ (\d{4}) # year (?: # group the day - time portions -? # optional hyphen (\d{1,3}) # julian day (?: # group the time portions \s+ # one or more whitespace (\d\d) # hour : (\d\d) # minute : (\d\d) # second )? # time is optional )? # day - time is optional /xg; print <<"HERE"; For input [$_] Year: $year Julian: $julian Hour: $hour Minute: $minute Second: $second HERE } __END__ 2005-100 10:12:01 2005100 10:12:01 2005-100 2005100 2005
    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
Re: regexp problem
by graff (Chancellor) on Feb 11, 2006 at 17:03 UTC
    while (<DATA>) { if ( /\d{4}-?(\d{3})\s{0,}?(\d\d:\d\d:\d\d)?/ ) { print "Match: $_" ; } else { print "NO MATCH: $_"; } } __DATA__ 2005-100 10:12:01 2005100 10:12:01 2005-100 2005100 2005
    That only prints "NO MATCH" on the last line of input, and to fix that, you just need one more "?" in the regex:
    /\d{4}-?(\d{3})?\s{0,}?(\d\d:\d\d:\d\d)?/ # ^--here

    (updated code to fix indenting)

Re: regexp problem
by grinder (Bishop) on Feb 11, 2006 at 20:11 UTC
    Any suggestions what the correct regexp should look like ?

    Here's another approach: you can write (I hope) simple regexps that match each sample datum. It then becomes a simple matter to assemble them together with Regexp::Assemble, and you'll get an efficient pattern as a result:

    #! /usr/local/bin/perl -wl use strict; use Regexp::Assemble; print Regexp::Assemble->new->chomp->add(<DATA>)->as_string; __DATA__ ^\d{4}-\d{3} \d\d(?:\d\d){2}$ ^\d{4}\d{3} \d\d(?:\d\d){2}$ ^\d{4}-\d{3}$ ^\d{4}\d{3}$ ^\d{4}$

    When I run the above, I get:

      ^\d{4}(?:-?\d{3}(?: \d\d(?:\d\d){2})?)?$

    update: I forgot to mention... now that you have the pattern, it should be obvious where to put the parentheses to capture the bits you're interested in. That is, as they say, left as an exercise to the reader :)

    • another intruder with the mooring in the heart of the Perl

Re: regexp problem
by smokemachine (Hermit) on Feb 11, 2006 at 18:23 UTC
    ($year, $julian, $time)=/^(\d{4})-?(\d{3})?\s*(\d{2}:\d{2}:\d{2})?/
Re: regexp problem
by jeanluca (Deacon) on Feb 11, 2006 at 17:19 UTC
    thats what I needed, Thnx!!
Re: regexp problem
by mickeyn (Priest) on Feb 12, 2006 at 12:03 UTC
    I suggest you take a look at POSIX::strptime.
    it might give you more abilities and a cleaner way to do what you ask.

    Enjoy,
    Mickey