ctp has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE 2 - 1-11-04, 7:53PM - I have finished it...it works as required. I will post the final code soon and let everyone beat up on it :-)
========================

update - to see the posts leading up to this script check this node out

Okay...be forewarned, it is verbose as heck! But at this stage of my learning I don't want to distill it down...we'll get to that after I turn in the working script and we can really have at it.

I am having brainfreeze on the subroutine, as to how best to collect the three values, and then parse the month variable thru my hash...I'm still not so good with hashes.

Next to last thing, Perl tells me that I have an illegal octal digit 8, and 9, in my hash. How do I escape those? Or am I doing something else wrong there?

Last thing, the 5th regex doesn't work...I've been going round and round with it, but nothing I've tried makes it work. It's probably something really dumb, but my dumb luck seems to be running low today. The other 4 seem to work fine. BTW - when I paste in my code I get whitespaces, that aren't really there in my script, after brackets and braces. Ah well...as always, any and all help is greatly appreciated!

#!usr/bin/perl #Script to parse dates with the following formats: #Apr 8 1984, Apr 08 84, 4/8/84, 04/08/84, 08 Apr 1984 use warnings; use strict; #declare my vars and make sure they're empty my ( $MM, $DD, $YY, $YYYY ); #take in date at command line and make text lowercase print "\n\n\n\n"; print "Date Converter, v1.0\n"; print "this program assumes that a year from 10 to 99 is in the 1900's +\n"; print "and that a year from 00 to 09 is in the 2000's\n"; print "please enter a date\n"; chomp (my $date = <>); $date = lc ($date); if ($date =~ /[a-zA-Z]{3}\s\d{1,2}\s\d{4}/) #parse Apr 8 1984 { my @dateparts = split (/\s/ , $date); $MM = $dateparts [0]; $DD = $dateparts [1]; $YYYY = $dateparts [2]; output ($MM, $DD, $YYYY); } elsif ($date =~ /[a-zA-Z]{3}\s\d{2}\s\d{2}/) #parse Apr 08 84 { my @dateparts = split (/\s/ , $date); $MM = $dateparts [0]; $DD = $dateparts [1]; $YY = $dateparts [2]; if (10 <= $YY && $YY <= 99) {$YYYY = $YY + 1900} elsif (0 <= $YY && $YY <= 9) {$YYYY = $YY + 2000} output ($MM, $DD, $YYYY); } elsif ($date =~ /\d\/\d\/\d{2}/) #parse 4/8/84 { my @dateparts = split (/\// , $date); $MM = $dateparts [0]; $DD = $dateparts [1]; $YY = $dateparts [2]; if (10 <= $YY && $YY <= 99) {$YYYY = $YY + 1900} elsif (0 <= $YY && $YY <= 9) {$YYYY = $YY + 2000} output ($MM, $DD, $YYYY); } elsif ($date =~ /\d{2}\/\d{2}\/\d{2}/) #parse 04/08/84 { my @dateparts = split (/\// , $date); $MM = $dateparts [0]; $DD = $dateparts [1]; $YY = $dateparts [2]; if (10 <= $YY && $YY <= 99) {$YYYY = $YY + 1900} elsif (0 <= $YY && $YY <= 9) {$YYYY = $YY + 2000} output ($MM, $DD, $YYYY); } elsif ($date =~ /\d{1,2}\s[a-zA-Z]{3}\s\d{4}/) #parse 08 Apr 1984 { my @dateparts = split (/\s/ , $date); $DD = $dateparts [0]; $MM = $dateparts [1]; $YYYY = $dateparts [2]; output ($MM, $DD, $YYYY); } else #contingency plan { print "your date is not of a recognizable format...good day.\n"; } #take in $MM $DD $YYYY, parse $MM with %months #and print "fullmonth day, year" to STDOUT sub output { my @outputdates = @_; my %months = ( jan => "January", feb => "February", mar => "March", apr => "April", may => "May", jun => "June", jul => "July", aug => "August", sep => "September", oct => "October", nov => "November", dec => "December", 1 => "January", 2 => "February", 3 => "March", 4 => "April", 5 => "May", 6 => "June", 7 => "July", 8 => "August", 9 => "September", 10 => "October", 11 => "November", 12 => "December", 01 => "January", 02 => "February", 03 => "March", 04 => "April", 05 => "May", 06 => "June", 07 => "July", 08 => "August", 09 => "September" ); #unfinished print "$outputdates[1], $outputdates[2]\n"; }

Replies are listed 'Best First'.
Re: more date conversion happiness, part 3
by CountZero (Bishop) on Jan 11, 2004 at 22:11 UTC
    You have to single quote all the month numbers which start with a 0 or otherwise Perl thinks you are using octal numbering (and as there is no such thing as an octal digit 8 or 9 you get the "illegal octal" error).

    To get the long month name out of your %months hash, just index this hash with the key: $months{$outputdates[0]}. It's like ordinary arrays (which are indexed by a number, but you use the key value here).

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: more date conversion happiness, part 3
by pg (Canon) on Jan 11, 2004 at 22:05 UTC

    The leading 0 indicates the numbers are octal, but 8 and 9 are not legal octal digits, octal digits are 0..7. So put 01 .. 09 in "" or ''.

    Also according to your comments, your print statement should be:

    print "$months{$outputdates[0]} $outputdates[1], $outputdates[2]\n";

    Update:

    In my original post, I only mentioned to put 08 and 09 in quotes, as direct answer towards the issue that the OP got illegal octal digit. But CountZero convinced me that, it could be misleading. so I changed it to 01 .. 09, for completeness.

    Thanks CountZero for his pursuit of perfection.

      In order to keep the leading 0, you have to single quote all numbers which start with a zero or the hash will not work. We are working with strings here and not with numbers (octal or other).

      Update: As PG told me, single or double quotes are both OK.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: more date conversion happiness, part 3
by neuroball (Pilgrim) on Jan 12, 2004 at 01:23 UTC

    A few monks already helped you over the octal problem, yet nobody did bite yet on your 5th regex problem.

    One of the reasons might be, that it isn't a problem, as you will notice when you execute the following code snippet from the command line:

    perl -e 'my $date = "08 Apr 1984";$date =~ /(\d{1,2})\s([a-zA-Z]{3})\s(\d{4})/;print "$1:$2:$3\n";'

    The output of this code snippet is: 08:Apr:1984. So no problem exists with your 5th regex.

    You might want to (a) read up on backreferences (the brackets () and the funny variables: $1, $2, $3, etc, used by me in the snippet) and (b) make your code verbose in a debugging sense.

    Debugging is the tracking down of errors. And one of the easier ways to debug code is to print out each variable you use before and after you modified it. You also might want to see what your expressions (in the if, elsif, etc statements) return.

    Once your print statements return values you didn't expect, you have a starting point for a close inspection.

    /oliver/

Re: more date conversion happiness, part 3
by ysth (Canon) on Jan 12, 2004 at 03:11 UTC
    Your regexes have a problem. If you have something like $str =~ /\d/, that will match the first digit in $str. When you are validating or parsing, you usually want to provide anchors: zero-width assertions about what characteristics the string must have at that point in the regex. For instance, to parse a string with 0 or more digits followed by one or more letters, you would say:
    $str =~ /^\d*[a-z]+\z/i;
    where the ^ says that part of the regex can only match at the beginning of the string and \z says that can match only at the end of the string, so that no unexpected characters are allowed before or after the pattern specified.

    Without the anchors, you get /\d*[a-z]+/i which will match any string that has at least one letter somewhere in it, e.g. ";!$#a-+".

    (You will often see $ used instead of \z; that will match either at the end of the string or immediately before a newline character at the end of the string; sometimes handy when dealing with unchomped input, but usually not what is actually wanted.)

    The meaning of ^ and $ changes when the //m flag is used, see perlre for details.

      Thanks. Funny you mention the anchors...the first couple versions of the script had all the anchors in place, but I got several replies that said I didn't need them, or should take them out, so I did. In the case of this script (as far as the input it was written to handle), they appear to work either way, but indeed with different input they might not.
        It really depends on whether your task is to take specified input and validate and parse it, or look for any date-like thing in the input. Without the anchors, m:\d{2}/\d{2}/\d{2}: will quite happily match the "04/08/84" in "123/3/104/08/840/2".