bart has asked for the wisdom of the Perl Monks concerning the following question:

Here's a little problem, for which I can think up some longish solutions, but not, from the top of my head, a simple, elegant, thus "perlish" solution. So, I'm inviting you to come up with one. It could even be a candidate for Perl Golf, though personally I'd prefer a simple and elegant solution to a golfed obfuscated one. Anyway...

I'm parsing a database description for a Progress database, and I want to create an importer for Oracle for an exported flatfile. One of the fields is of the type "date". Now, at first sight, I find format specifications for dates like "99/99/9999", which should produce a format "DD/MM/YYYY", and "99/99/99", producing "DD/MM/RR", for use in a to_date() call. Note: "RR" specifies a 2 digit year, which will be considered in the range 2000-2049 if it's < 50, and between 1950 and 1999 if >= 50.

Simple enough, a conversion hash

%dateformat = ( '99/99/99' => 'DD/MM/RR', '99/99/9999' => 'DD/MM/YYYY', '9999/99/99' => 'YYYY/MM/DD' );
would do... Except: I want to be able to accept other separators than slashes, too. The script should be smart enough to produce the appropriate format with the actual separators. In short, it should produce 'DD-MM-YYYY' out of '99-99-9999', for example.

Your assignment is to write some Perl code which does this conversion. A generic solution, using a data table for the production, is preferred over a hardwired solution, using a code block for each pattern.

I'll post a solution of my own in a follow-up, later.

Replies are listed 'Best First'.
Re: Generating a format template for a date
by blazar (Canon) on Oct 13, 2006 at 16:46 UTC

    Indeed I have the impression too that there must be some simple, neat way to do it, but I tried a pair of approaches and the problem turns out to be more elusive than one may think at first sight. All in all the following is a solution (of course it misses error handling for non sensible input, but that's an aside), but I know in advance it has very few chances of being any neater than what you may have thought of yourself

    #!/usr/bin/perl -l use strict; use warnings; { my %df = ( '99/99/99' => [qw|DD / MM / RR|], '99/99/9999' => [qw|DD / MM / YYYY|], '9999/99/99' => [qw|YYYY / MM / DD|], ); sub dateformat { my @in=split /(9+)/, shift; shift @in; my @out=@{ $df{join '/', @in[0,2,4]} }; @out[1,3]=@in[1,3]; join '', @out; } } chomp, print dateformat $_ while <DATA>; __END__ 99-99/99 99.99.9999 9999.99-99

      My solution goes along the same lines, more or less:

      my %dateformat = ( '99(.)99(.)99' => '"DD${1}MM${2}RR"', '99(.)99(.)9999' => '"DD${1}MM${2}YYYY"', '9999(.)99(.)99' => '"YYYY${1}MM${2}DD"', ); DATE: while (<DATA>) { ## updated: it was 'foreach' chomp; foreach my $k (keys %dateformat) { next unless /^$k$/; my $trans = $_; $trans =~ s/^$k$/$dateformat{$k}/ee; print "translated <$_> to <$trans>\n"; next DATE; } warn "$_ unmatched\n"; } __DATA__ 99/99/9999 99/99/99 99-99-9999 9999999 9999@99#99

      --
      David Serrano

Re: Generating a format template for a date
by duckyd (Hermit) on Oct 13, 2006 at 16:51 UTC
    Why use a conversion hash when you can just convert?

    UPDATE: initally missed leading year format, updated code to handle it

    use strict; use warnings; while(<DATA>){ s/^(\d{2,4})(\D)(\d{2})(\D)(\d{4}|\d{2})/ ( length $1 > 2 ? 'YYYY' : 'DD' ) ."$2MM$4" .( length $5 > 2 ? length $1 <= 2 ? 'YYYY' : 'RR' : 'DD' )/ex; print; } __DATA__ 99/99/9999 99/99/99 99-99-9999 99-99-99 99:99:9999 99:99:99 99+99+9999 9999-99-99 9999:99:99 9999+99+99
    Output:
    DD/MM/YYYY DD/MM/DD DD-MM-YYYY DD-MM-DD DD:MM:YYYY DD:MM:DD DD+MM+YYYY YYYY-MM-DD YYYY:MM:DD YYYY+MM+DD
      Your output doesn't have any RR's in it ..
Re: Generating a format template for a date
by jbert (Priest) on Oct 13, 2006 at 17:03 UTC
    Not particularly elegant, but fairly clean:
    sub convert_format { my $format = shift; # If we have a 4 digit year $format =~ s/9999/YYYY/; # Two cases: if ($format =~ /^Y/) { # Month first $format =~ s/99/MM/; $format =~ s/99/DD/; } else { # Day first $format =~ s/99/DD/; $format =~ s/99/MM/; } # Handle two digit year $format =~ s/99/RR/; return $format; }
      Extension of that idea, ditching the conditionals (implicitly burying them in the regex's):
      while(<DATA>){ s/9999/YYYY/; # 4-char year first s/(?<=[^9])99(?=[^9])/MM/; # month is always in the middle s/(?<!YYYY.{4})99$/RR/; # 2-char year is always at end, and won +'t have a 4-char year already s/99/DD/; # the remaining 2-chars will be the day print; } __DATA__ 99/99/9999 99/99/99 99-99-9999 99-99-99 99:99:9999 99:99:99 99+99+9999 9999-99-99 9999:99:99 9999+99+99
        Nice! Doing multiple subs on the same target is something which looks nicer using the implicit $_.

        I definitely need to read up on my extended regexps though, those look-ahead/behind assertions are gnarly, dude.

Re: Generating a format template for a date
by MonkE (Hermit) on Oct 13, 2006 at 18:26 UTC
    I know others have already posted their versions, but here is my attempt at a data-driven implementation. It allows alternative delimiters, but requires that the chosen delimiter is the same throughout the date expression. If you want to validate the incoming date before choosing a format, you should use Date::Manip or something similar.
    #!/usr/bin/perl use strict; use warnings; my %Pattern = ( '(\d{1,2})([-\/.])(\d{1,2})\2(\d{2})' => 'MM%sDD%sRR +', '(\d{1,2})([-\/.])(\d{1,2})\2(\d{4})' => 'MM%sDD%sYY +YY', '(\d{4})([-\/.])(\d{1,2})\2(\d{1,2})' => 'YYYY%sMM%s +DD' ); sub ChooseDateFmt($) { my $datestr = shift or die; for my $p (keys %Pattern) { return sprintf($Pattern{$p}, $2, $2) if $datestr =~ /^\s*$p\ +s*$/; } } while (<DATA>) { chomp; print "Date\t$_\tFormat=" . ChooseDateFmt($_) . "\n"; } __DATA__ 10/15/2006 2006/11/01 11/01/06 2006.11.01 11-01-06
    Output:
    Date 10/15/2006 Format=MM/DD/YYYY Date 2006/11/01 Format=YYYY/MM/DD Date 11/01/06 Format=MM/DD/RR Date 2006.11.01 Format=YYYY.MM.DD Date 11-01-06 Format=MM-DD-RR
Re: Generating a format template for a date
by johngg (Canon) on Oct 14, 2006 at 11:29 UTC
    This attempt uses regular expression code blocks to set the format string. I was hoping to find a way to use regex conditional patterns but couldn't get it to work. Here it is

    use strict; use warnings; my $fmt; print map { sprintf qq{Date: %-12sFormat: %-s\n}, $_->[0], $_->[1] } map { $fmt = q{Date not valid}; m{(?x) ^\d\d(\D)\d\d(\D)\d{4}$ (?{$fmt = qq{DD${1}MM${2}YYYY}}) }; m{(?x) ^\d{4}(\D)\d\d(\D)\d\d$ (?{$fmt = qq{YYYY${1}MM${2}DD}}) }; m{(?x) ^\d\d(\D)\d\d(\D)\d\d$ (?{$fmt = qq{DD${1}MM${2}RR}}) }; [$_, $fmt]; } map {chomp; $_;} <DATA>; __END__ 99/99/9999 99/99/99 99-99-9999 99-99-99 99-99 99-99-999 99:99:9999 99:99:99 99+99+9999 9999-99-99 9999:99:99 9/99/99 9999+99+99 99-99.9999 99.99-9999

    and the output is

    Date: 99/99/9999 Format: DD/MM/YYYY Date: 99/99/99 Format: DD/MM/RR Date: 99-99-9999 Format: DD-MM-YYYY Date: 99-99-99 Format: DD-MM-RR Date: 99-99 Format: Date not valid Date: 99-99-999 Format: Date not valid Date: 99:99:9999 Format: DD:MM:YYYY Date: 99:99:99 Format: DD:MM:RR Date: 99+99+9999 Format: DD+MM+YYYY Date: 9999-99-99 Format: YYYY-MM-DD Date: 9999:99:99 Format: YYYY:MM:DD Date: 9/99/99 Format: Date not valid Date: 9999+99+99 Format: YYYY+MM+DD Date: 99-99.9999 Format: DD-MM.YYYY Date: 99.99-9999 Format: DD.MM-YYYY

    I hope this is of interest.

    Cheers,

    JohnGG

    Update: Got the regex conditionals working so here's the script the way I originally intended. It produces identical output to the first version.

    Update 2: Broke the closing brackets of the conditionals onto separate lines with appropriate indentation to aid clarity.

    use strict; use warnings; my $fmt; print map { sprintf qq{Date: %-12sFormat: %-s\n}, $_->[0], $_->[1] } map { m {(?x) ^ (?(?=\d\d(\D)\d\d(\D)\d{4}$) (?{$fmt = qq{DD${1}MM${2}YYYY}}) | (?(?=\d{4}(\D)\d\d(\D)\d\d$) (?{$fmt = qq{YYYY${3}MM${4}DD}}) | (?(?=\d\d(\D)\d\d(\D)\d\d$) (?{$fmt = qq{DD${5}MM${6}RR}}) | (?{$fmt = q{Date not valid}}) ) ) ) }; [$_, $fmt]; } map {chomp; $_;} <DATA>; __END__ 99/99/9999 99/99/99 99-99-9999 99-99-99 99-99 99-99-999 99:99:9999 99:99:99 99+99+9999 9999-99-99 9999:99:99 9/99/99 9999+99+99 99-99.9999 99.99-9999
Re: Generating a format template for a date
by bart (Canon) on Oct 15, 2006 at 22:33 UTC
    It's now 2 days later than my original question, and I've had the chance to think it over a little. As a result, I've come up with a few approaches I hadn't even thought of at the time.

    Note: the format for the Progress database must be compatible with the target format for PL/SQL. Actually, the former is just less specific. All you need is a format description for PL/SQL, and by replacing every \w character with a "9", you get the format for Progress. Otherwise, it simply wouldn't work. And by deriving the former from the latter, you can make sure that you din't make a mistake in this regard.

    First, I think you have to generalise the Progress formats into (regex) patterns. When you find a pattern that matches, you can turn the input pattern into the customised output pattern in several way, of which I've implementend a few in a separate sub, with a flag to switch what implementation to use. See the comments there to what each does.

    Apart from the implementation of the actual replacement, one can make good use of memoization: there's no need to recalculate the target format again and again for the same input format string. So here's a demo of such memoized calculation.

    #! perl -w use strict; my %dateformat; # memoization hash my @rule; foreach my $target (qw(DD/MM/YYYY DD/MM/RR YYYY/MM/DD)) { (my $input = $target) =~ s/\w/9/g; if(exists $dateformat{$input}) { die "Clashing target formats: $dateformat{$input} <-> $target" +; } $dateformat{$input} = $target; (my $pattern = $input) =~ s/\W/\\W/g; push @rule, [ qr/^$pattern$/, $input, $target ]; } # memoized calculation: use Carp; sub convert { my $input = shift; return $dateformat{$input} ||= do { my $target; foreach my $ar (@rule) { my $re = $ar->[0]; if($input =~ /$re/) { # pattern matches, now apply differences $target = alter($input, @$ar); last; } } $target; } or croak "No format pattern found for $input"; } our $how; # 5 alternative ways to do the same thing sub alter { my($input, $re, $from, $to) = @_; if($how eq 'SPLIT') { # split into ('99', '-', '99', '-', '9999') # insert ('DD', 'MM', 'YYYY') into the even numbered positions + (0, 2, 4) # and join again my @parts = split /(\W+)/, $input; @parts[grep !($_%2), 0 .. $#parts] = $to =~ /(\w+)/g; return join '', @parts; } elsif($how eq 'REPLACE_SUBSTR') { # replace each "9" with the character at the same position in +the original target string $input =~ s(9){ substr $to, $-[0], 1 }ge; return $input; } elsif($how eq 'REPLACE_9') { # inject ('D', 'D', 'M', 'M', 'Y', 'Y', 'Y', 'Y') via s/// eac +h replacing a "9" my @subst = $to =~ /(\w)/g; $input =~ s(9){ shift @subst }ge; return $input; } elsif($how eq 'REPLACE_9S') { # ditto but now replacing groups of nines # inject ('DD', 'MM', 'YYYY') via s/// my @subst = $to =~ /(\w+)/g; $input =~ s(9+){ shift @subst }ge; return $input; } elsif($how eq 'XOR') { # stringwise XOR with '9'^'D' for example # this will replace '9' with 'D' # XOR '-' with ('/'^'/') which is "\0", yields '-' # thus, only the nines will be affected my $xor = "$from" ^ "$to"; return "$input" ^ $xor; } else { die "Huh? What did you want me to do? '$how'?"; } } foreach $how (qw(REPLACE_SUBSTR SPLIT REPLACE_9 REPLACE_9S XOR)) { delete $dateformat{'99-99-9999'}; # kill memoized value; printf "%s: %s\n", $how, convert('99-99-9999'); }