hash, Date:Calc use - seeking improvements

ww has asked for the wisdom of the Perl Monks concerning the following question:

I created (15 Mar 05) the following because -- despite the form of another's SOPW question, grep for last 45 days -- the problem looked like a good learning experience for /me.

This works with the test data supplied. Changes for differently structured data would be trivial, but...
Wise ones, would any care to suggest amendments that would improve the syntax or algorithm (without making it so concise and idiomatic as to be unintelligible to rookies)? I recognize the clumsy creation of the hash and subsequent deletion of the first element as one prime candidate (but so far neither docs nor Cookbook have insinuated anything better in my brain). Also, as the conversion of mday and non-conversion of month and year (at lines 34-36) demonstrate, I just haven't "got" something in the litt re Time::Local.

update: graff's point tells us that I just plain confused $mday with $mon (which is used in the sub). And BigLug offers a concise answer re creation of the hash.
Both replies offer just what I was hoping for! But it might clarify for others if s/improvements/critiques/ in the title
and if I make it explicit that I do indeed know and honor the wisdom of "use the module; don't re-invent the wheel." But I need to be better at the basics ... and -- based on experience with other self-teaching efforts -- doing self-imposed puzzles like this and heeding the critiques really works well for me.

So, thanks to BigLug and Graff (and to future responders) for helping me.

#!C:/Perl/bin -w
use strict;
use vars qw ( @old $old %data $sec $min $hour $mday $mon $year $etime 
+);
use Time::Local qw (timelocal);

# create hash to hold filename and $etime with a fake key  
# and value -- both to be deleted after the foreach on DATA

%data = ( "file", 0 );

@old = <DATA>;
foreach $old(@old) {
   if ( $old =~ /    # 1 of many ways. For clarity, not elegance
      ^              # start at beginning of line
      [MTWFS]        # Any one UpperCase first_letter_of_day_of_week (
+english)
      [a-z]{2}       # Any lowercase letter, 2 times exactly
      \s             # One whitespace char
      ([A-Za-z]{3})  # UC or lc in range A-Z or a-z, 3 times -- CAPTUR
+E month $1
      \s             # One whitespace char
      ([\d]{1,2})    # one or two digits - day_of_month -- capture day
+_of_month $2
      \s             # One whitespace char
      ([\d]{1,2}):   # Any dec dgt or :, 1 OR 2 times -- CAPTURE Hour 
+$3 - fol by :
      ([\d]{2}):     # Any dec dgt or :,  2 times -- CAPTURE Min $4- f
+ol by :
      ([\d]{2})\s    # Any dec dgt or :, 2 times -- CAPTURE Sec $5 - f
+ol by sp
      [ECMP][SD]T    # Eastern, Central...Std or Day...T 
      \s             # One whitespace
      (200\d)        # One whitespace fol by "200" and 1 more dgt CAPT
+URE Yr $6
      .* /x          # anything_except_\n to end_of_record,  end m//, 
+extended syntax
      )              # end condition
    {                # begin action block
         $sec = $5;
         $min = $4;
         $hour = $3;
         $mday = ( $2 - 1 );      # mday is in range 0-30 for timeloca
+l
         $mon = $1;               # needs to be in 0..11 form: &conver
+t_month
         $year = $6;              # ?! timelocal years are years - 190
+0

         &cvrt_mon_lcl($mon);

         # etime from $old form to Epoch seconds
         $etime = timelocal($sec, $min, $hour, $mday, $mon, $year);  

         chomp ($old);              # clean \n off $old. 
         $old =~ s/is : / /;        # shorten to DateTime:depth 

         # print "\$etime for \"$old\"|| \t$etime\n";   

         $data{$old}= $etime;  #  PUSH etime to the hash

   } else {
        print "no match \n";
   }
}
# NOW, delete the (fake) elements with which we initialized the hash
 
delete $data{"file"};

# NEXT convert current time to seconds since EPOCH

my $now = time();

# ASSESS the DATA elements for > or < 45 days old

 while (($old, $etime) = each %data) {

   if ( ($now - $etime) > (24*60*60*45) ) {      # 24hr*60min*60sec*45
+ days
      print " \"$old\" -  is MORE than 45 days old\n";
      print "\t It's roughly ".int(($now-$etime)/86400)." days old\n\n
+";


   } else {       # This  errs in favor of "LESS THAN" (no == test abo
+ve)
      print " \"$old\" -  is LESS than 45 days old\n";
      print "\t It's only about ".int(($now-$etime)/86400)." days old\
+n\n";
   }
}

exit;

##### sub cvrt_mon_lcl  -- FOR timelocal and localtime; gmtime is diff
+erent!

sub cvrt_mon_lcl {
    if ($mon eq "Dec") {
       $mon = 11;

    }elsif ($mon eq "Nov") {
       $mon = 10;
    }elsif ($mon eq "Oct") {
       $mon = 9;
    }elsif ($mon eq "Sep") {
       $mon = 8;
    }elsif ($mon eq "Aug") {
       $mon = 7;
    }elsif ($mon eq "Jul") {
       $mon = 6;
    }elsif ($mon eq "Jun") {
       $mon = 5;
    }elsif ($mon eq "May") {
       $mon = 4;
    }elsif ($mon eq "Apr") {
       $mon = 3;
    }elsif ($mon eq "Mar") {
       $mon = 2;
    }elsif ($mon eq "Feb") {
       $mon = 1;</readmore>
    }elsif ($mon eq "Jan") {
       $mon = 0;
    }
    return($mon);
}
[download]

__DATA__
Fri Dec 3 23:07:21 EST 2004 : Depth is : 234
Sat Dec 4 00:07:29 EST 2004 : Depth is : 123
Sun Dec 5 03:07:32 EST 2004 : Depth is : 144
Tue Feb 8 05:21:18 EST 2005 : Depth is : 11
Sat Feb 26 14:13:12 EST 2005 : Depth is : 4567
Fri Mar 11 17:09:38 EST 2005 : Depth is : 265
Sat Mar 12 07:07:27 EST 2005 : Depth is : 1654
[download]

Comment on hash, Date:Calc use - seeking improvements Select or Download Code

Replies are listed 'Best First'.
Re: hash, Date:Calc use - seeking improvements by BigLug (Chaplain) on Mar 16, 2005 at 00:09 UTC
If you're doing this just for the sake of the exercise, I guess I'll enourage you. However there are already many many date/time modules that are available to you that have all previously dealt with the problems of parsing datetime strings. Firstly, a personal preference: I hate seeing all variables declared at the top. Declare them as you use them. The second thing that strikes me in your code is your declaration of your hash. You don't need to assign any value at all. Something like `my %hash;` works perfectly. Or if you prefer: `my %hash = ();`. That way you won't have to delete the key later. Then there's no need to use `@old` as your `<DATA>` file handle can work inside the foreach: `foreach $old(<DATA>) { ... }` [download] Also your `cvrt_mon_lcl` subroutine would be better as a hash: `my %month_number; @month_number{qw/Jan Feb Mar Apr May ... Dec/} = (0 .. 11); print $month_number{Jan}; # 0` [download] Back to all those modules to select from, I'm a DateTime fan (and developer) so I'll give you my DateTime solution: use DateTime; use DateTime::Format::Strptime; my $parser = DateTime::Format::Strptime->new( pattern => '%a %b %d %T EST %Y', # see the docs for the patterns locale => 'en_US', # to get the month and day names to parse time_zone => 'America/New_York', ); my $now = DateTime->now(time_zone => 'America/New_York'); foreach (<DATA>) { my ($dt) = $_ =~ /^(.+?)\s:\s/; $dt = $parser->parse_datetime($dt); # turn the string into a dateti +me print $_ if (($dt < $now) and ($dt->add(days => 45) > $now)) } __DATA__ Fri Dec 3 23:07:21 EST 2004 : Depth is : 234 Sat Dec 4 00:07:29 EST 2004 : Depth is : 123 Sun Dec 5 03:07:32 EST 2004 : Depth is : 144 Tue Feb 8 05:21:18 EST 2005 : Depth is : 11 Sat Feb 26 14:13:12 EST 2005 : Depth is : 4567 Fri Mar 11 17:09:38 EST 2005 : Depth is : 265 Sat Mar 12 07:07:27 EST 2005 : Depth is : 1654 Sat Mar 26 07:07:27 EST 2005 : Depth is : 1654 [download] Some notes: The pattern could use %Z instead of 'EST' in which case it wouldn't need to specifically declare the time zone. However 'EST' is an ambiguous zone and so we can't use it. We add 45 days to the datetime and then check if it's greater than now rather than just subtracting it from now and checking that the difference is over 45 days because of the problems inherrent with date math ('daylight savings'/leap years/seconds etc.) I also check that it isn't a future date, but if this is log parsing that shouldn't be a problem If you only need day precision see the 'truncate' method in the DateTime documentation. DateTime is the only set of modules that properly handle time zones and daylight savings (as well as handling international locales). Cheers! Rick If this is a root node: Before responding, please ensure your clue bit is set. If this is a reply: This is a discussion group, not a helpdesk ... If the discussion happens to answer a question you've asked, that's incidental.	[reply] [d/l] [select]
Re: hash, Date:Calc use - seeking improvements by graff (Chancellor) on Mar 16, 2005 at 04:16 UTC
Also, as the conversion of mday and non-conversion of month and year (at lines 34-36) demonstrate, I just haven't "got" something in the litt re Time::Local. Well as it turns out, this little bit from your code does confirm that opinion: `#... $mday = ( $2 - 1 ); # mday is in range 0-30 for timelocal #...` [download] No. Actually, the "mday" value used by timelocal should range between 1 and 31, just like the "mday" value returned by localtime. (An interesting feature of Time::Local is that you can give it out-of-range values for some fields, and it will do the "right" thing; e.g. give it $mon=1 (Feb) and $mday=30, and it will give you epoch seconds for March 2 (or March 1, if it's a leap year). So the fact that you were passing "wrong" values for mday meant that you got unexpected retuen values, instead of some carp/croak/die behavior.) As for the "year" value, timelocal is quite clever and forgiving about that -- if you haven't read the details about this in "perldoc Time::Local", you should. Which leads me to the obscure detail in your initial regex that will need to be modified less than six years from now; surely you're not so young that you never heard of "Y2K bugs" -- you've got a "Y2K+10" bug. (But I know you'll be able to fix it in time. ;) I would just use a hash to get from month name to month number. Setting it up can be tedious, but less so than devoting a whole subroutine to the task. Here is probably the least offensive way, all things considered: `my @mon_names = qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/; my %mnth_str2digit = map { $mon_names[$_] => $_ } (0..$#mon_names); # or, if map is "unintelligible to rookies", use a for loop: my %mnth_str2dgt; for ( 0..$#mon_names ) { $mnth_str2digit{$mon_names[$_]} = $_; ]` [download] This way, you just use the 3-letter month name to do a hash look-up for the month number, rather than calling a sub that alters the value of a global (which is considered bad form). Apart from all that, I tend to agree with the first reply: if you're just doing this for the sake of practice with regexes and particular core modules, or if you really are in a situation where the presence of more powerful non-core modules simply can't be assured or controlled, that's fine -- knock yourself out. But if relevant non-core modules are present, or you have the ability to install them as needed, it can often be a mistake to avoid using them for reasons like "my task is simple enough that I don't need a module" or "figuring out how to use the module will take too long" or "it'll make my script too complicated/difficult for me or others to follow". Regardng the first reason, even if it's true now, it might not be true later (e.g. when you get new/different data next week); as for the others, they tend to be imaginary and contrary to fact. I happened to choose Date::Manip as a solution when I replied to grep for last 45 days -- it was just the first one that came to mind, and it was installed already, though I hadn't really used it at all. Starting cold, it didn't take more that 10 minutes to read enough of the man page and put together a 5-line script to solve the stated task, and I don't think the result was all that cryptic. (Okay, newbies would spend more time reading the man page and puzzling out how to use the module in a script, but still, they'd fare better that way than without the module.)	[reply] [d/l] [select]