in reply to date format string -> Reg Ex solution

use strict; sub prepareFormat { my $format = shift(); my ($i, @order) = 0; $format =~ s/([YMDhms]+)(\?)?/ $order[$i++] = substr($1,0,1); '('. +('\d' x length($1))."$2)"/ge; $format = qr/^(?:$format)$/; return [$format, \@order]; } sub parseDate { my ($format, $date) = @_; my @data = ($date =~ $format->[0]) or return; my %result; for(my $i = 0; $i <= $#data; $i++) { $result{$format->[1]->[$i]} ||= $data[$i]; } return map $result{$_}, qw(Y M D h m s); } #my $format = prepareFormat ('YYYY/MM/DD'); #my $format = prepareFormat ('YYYY-DD-MM|YYYY-DD-M|YYYY-D-MM|YYYY-D-M' +); #my $format = prepareFormat ('YYYY-DD?-MM?'); #my $format = prepareFormat ('YYYY-DD?-MM? hh?:mm?'); #my $format = prepareFormat ('YYYY-DD?-MM? hh?:mm?(?::ss?)?'); my $format = prepareFormat ('YYYY-DD?-MM?(?: hh?:mm?(?::ss?)?)?'); while (<>) { chomp; my ($year, $month, $day, $hour, $min, $sec) = parseDate($format, $ +_) or print "\tNot in the right format!\n" and next; print "Year: $year, Month: $month, Day: $day, Hour: $hour, Min: $m +in, Sec: $sec\n"; }

The prepareFormat() returns a reference to an array containing the regexp and the mapping between the capturing parenthesis and the parts of the date, the parseDate() uses this structure to match the date and extract the data. It then returns (year,month,day,hour,min,sec).

As you can see from the examples you may accept&parse several different datetime formats at once.

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
   -- Rick Osborne

Edit by castaway: Closed small tag in signature

Replies are listed 'Best First'.
Re: Re: date format string -> Reg Ex solution
by markjugg (Curate) on Apr 23, 2003 at 19:06 UTC

    Incredible.

    This is exactly the kind of thing I was looking for. It doesn't handle AM/PM, which important in my case. Below is my hacked version that supports AM/PM by using "pp" to designate this. This creates an exceptional case in the script, because before only numbers were accepted, so instead of just \d, I'm now matching against \dPpAaMm

    # test like this: perl this_script.pl '02/04/2003 06:30 PM' use strict; sub prepareFormat { my $format = shift; # TODO: check that only valid characters appear in the format # The logic should be: for any character A-Z in the format string, # die if it's not one of: Y M D h m s p my ($i, @order) = 0; my $num_chars = 'YMDhms'; $format =~ s/([$num_chars]+)(\?)?/ $order[$i++] = substr($1,0,1); +'('.('\d' x length($1))."$2)"/ge; # use "p" for AM/PM $format =~ s/pp/ $order[$i++] = substr('p',0,1); "(AM|PM)"/e; $format = qr/^(?:$format)$/; return [$format, \@order]; } sub parseDate { my ($format, $date) = @_; my @data = ($date =~ $format->[0]) or return; my %result; for(my $i = 0; $i <= $#data; $i++) { $result{$format->[1]->[$i]} ||= $data[$i]; } $result{h} += 12 if ($result{p} eq 'PM' and $result{h} != 12); $result{h} = 0 if ($result{p} eq 'AM' and $result{h} == 12); return map $result{$_}, qw(Y M D h m s); } #my $format = prepareFormat ('YYYY/MM/DD'); #my $format = prepareFormat ('YYYY-DD-MM|YYYY-DD-M|YYYY-D-MM|YYYY-D-M' +); #my $format = prepareFormat ('YYYY-DD?-MM?'); #my $format = prepareFormat ('YYYY-DD?-MM? hh?:mm?'); #my $format = prepareFormat ('YYYY-DD?-MM? hh?:mm?(?::ss?)?'); #my $format = prepareFormat ('YYYY-DD?-MM?(?: hh?:mm?(?::ss?)?)?'); my $format = prepareFormat ('MM/DD/YYYY hh:mm pp'); my ($year, $month, $day, $hour, $min, $sec) = parseDate($format, $ +ARGV[0]) or print "\tNot in the right format!\n" and exit; print "Year: $year, Month: $month, Day: $day, Hour: $hour, Min: $m +in, Sec: $sec\n";

    -mark

      First a tiny nit. substr('p',0,1); is better written as 'p' ;-)

      Next there is a potential problem. Your code works correctly only if the AM/PM is the last thing in the string. This may seem to be an unimportant restriction, but imagine the format was 'YYYY/MM/DD hh:mm pp|DD.MM.YYYY hh:mm pp'. In this case the indexes in @order will be incorrect!

      The solution is to process the 'pp' at the same time as the other characters:

      use strict; sub prepareFormat { my $format = shift; # TODO: check that only valid characters appear in the format # The logic should be: for any character A-Z in the format string, # die if it's not one of: Y M D h m s p my ($i, @order) = 0; my $num_chars = 'YMDhms'; $format =~ s{([$num_chars]+|pp)(\?)?}{ $order[$i++] = substr($1,0,1); if ($1 eq 'pp') { "(AM|PM|am|pm)" } else { '('.('\d' x length($1))."$2)" } }ge; $format = qr/^(?:$format)$/; return [$format, \@order]; } sub parseDate { my ($format, $date) = @_; my @data = ($date =~ $format->[0]) or return; my %result; for(my $i = 0; $i <= $#data; $i++) { $result{$format->[1]->[$i]} ||= $data[$i]; } $result{h} += 12 if (uc($result{p}) eq 'PM' and $result{h} != 12); $result{h} = 0 if (uc($result{p}) eq 'AM' and $result{h} == 12); return map $result{$_}, qw(Y M D h m s); } ...

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature