rupesh has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I was wondering if anyone has tried this before:
I have a file, containing strings starting with a date.
Eg.
8-12-03,somename,someaddress....<br> 8-13-03,someothername,someaddress....
I have today's date: 8-25-03. Based on this date, im trying to match a pattern for last week.
Hence, I would match $string =~ /^8\-[19-25]\-03/
but, if the date was 8-1-03, the pattern would be
$string =~ /^7\-[26-]\-03/ || $string =~ /^8\-[1-2]\-03/
The week starts on Monday and ends on a Sunday
I have 2 digits each for month, day and year
And I am doing the manipulation in win32
This seems a bit tricky though, but any inputs would help
Thanks!

Did you ever notice that when you blow in a dog's face, it gets mad at you but when you take him on a car ride,he sticks his head out the window and likes it?

Replies are listed 'Best First'.
Re: Week Algorithm
by edan (Curate) on Aug 25, 2003 at 06:50 UTC

    Look into the CPAN module Date::Calc.

    You can also look in the Q&A section under 'dates and times' for more handy ideas. As you can probably see by now, a regex is unlikely to take care of this problem for you.

    --
    3dan
Re: Week Algorithm
by esh (Pilgrim) on Aug 25, 2003 at 07:16 UTC
    I like this problem. Please take the following points as trying to be helpful, not as negative criticisms.

    1. You say that the week "starts on Monday and ends on a Sunday" yet you provide an example which looks like Tuesday (8-19-03) through Monday (8-25-03). If you can better define what "last week" means, it would be easier to figure out if code met the spec. For example, is "today" ever part of "last week" or is it always the Mon-Sun that is completely before today?

    2. You say you have "2 digits each for month, day and year" yet the examples you provide make it look like a single digit is accepted. Does the input you are trying to match have a leading zero for single digit months and days?

    3. I don't think /^8\-[19-25]\-03/ does what you want it to. The character class in the middle (surrounded by square brackets) actually matches a single character of "1" or "9" through "5" or "5". The "9" through "5" is also going to be flagged as an invalid range since it starts with something higher than it ends.

    4. You don't need to escape dashes (-) outside of character ranges.

    5. If a regular expression is really the best approach, I think you are most likely going to end up with one that looks something like this:

    $string =~ /^(7-26-03|7-27-03|7-28-03|7-29-03|7-30-03|7-31-03|8-1-03|8 +-2-03)/
    This could be reduced by extrapolating the common portions, but you probably would not gain much and this format has the benefit that you can glance at it and know that it matches exactly the days you want.

    6. Oh, yeah, I'd use Date::Manip to do the date calculations, but that's just because it's the only date manipulation package I've learned, it's portable (completely Perl), and it includes the kitchen sink. Be warned, though, that it's a bit tricky to understand the interface when you're starting out. Examples and testing help a lot.

    -- Eric Hammond

      Hi,
      Based on your "points", including the fact that Ive been trying to simplify the way im going to handle the problem, I want to make some modifications:

      1. Last week is 6 days before the current day, including the current day (so you will have 7 days always).
      2. If the day/month has only 1 digit (eg. 1-9), there is no leading zeros. The year has a leading zero.

      Thanks for your feedback. I'll try Date::Manip. Though I had a look at Date::Calc that was suggested by 3Dan, it seems to be complicated too, and not a quick solution. Otherwise, it has a lot of features. The tricky part is to get the output from the module and put it in a regular exp.

      Did you ever notice that when you blow in a dog's face, it gets mad at you but when you take him on a car ride,he sticks his head out the window and likes it?

        Here is a lightly-tested solution using POSIX::mktime and localtime calculations based on your latest requirements. I make some assumptions about your data, so you may have to modify the splits to match your data correctly.

        use POSIX qw/mktime/; my ($dy, $mo, $yr) = (localtime)[3 .. 5]; $mo += 1; $yr = sprintf("%02d", $yr % 100); my $today = join '-', $mo, $dy, $yr; my $today_in_epoch = get_epoch_time($today); my $week_ago = $today_in_epoch - (60 * 60 * 24 * 6); # 6 days print "today is: $today\n"; while (<DATA>) { my ($date, undef) = split /,/; my $date_in_epoch = get_epoch_time($date); if ($date_in_epoch >= $week_ago && $date_in_epoch <= $today_in_epo +ch) { print "$date is in the last 6 days!!\n"; } } sub get_epoch_time { my $date = shift; my ($mo, $day, $year) = split /-/, $date; # notice we assume the year is 20XX return POSIX::mktime( 0, 0, 0, $day, $mo - 1, 100 + $year ); } __DATA__ 8-12-03,somename,someaddress....<br> 8-13-03,someothername,someaddress.... 8-20-03,foo,bar 8-18-03,blah,blah 8-19-03,blah,blah 08-25-03,blah,blah 8-30-03,future,date

        HTH

        --
        3dan
Re: Week Algorithm
by blokhead (Monsignor) on Aug 25, 2003 at 07:35 UTC
    In your example, 7-26-03 (the beginning of your example regex range) is a Saturday, so I don't really understand if it's relevant that weeks for you go from Monday to Sunday. Anyway, it's not that hard to generate a list of valid dates and then combine those into a regex string that you can plop right into your matching code. Here's a little bit of code to get you started:
    use Time::Local 'timelocal_nocheck'; my ($m, $d, $y) = (8, 1, 2003); ## list dates between 0 and 6 days ago: my @dates; for my $days_ago (0 .. 6) { my @date_info = localtime timelocal_nocheck(0, 0, 0, $d-$days_ago, $ +m-1, $y-1900); push @dates, sprintf("%s-%s-%s", $date_info[4]+1, $date_info[3], $date_info[5]+1900); } ## convert list of dates to a regex my $regex = join "|", map quotemeta, @dates; print "$regex\n"; ## now you can do stuff like; ## if ($line =~ /^($regex)/) { ## print "this happened last week ($1)\n"; ## } __END__ 8\-1\-2003|7\-31\-2003|7\-30\-2003|7\-29\-2003|7\-28\-2003|7\-27\-2003 +|7\-26\-2003
    This seems to output pretty much what you wanted, at least for 8-1-2003. If you're not familiar with Time::Local's timelocal_nocheck function, you should really take a look at it to see what the heck is going on here.. And if you really need these ranges to be aligned to your Mon-Sun weeks, you can calculate how many days to back up based on what day of the week today is (info you can get from localtime).

    Unlike the other repliers, I prefer using Time::Local's timelocal_nocheck for date calculations whenever I can, because Time::Local is a standard module, and much much less overhead than Date::Calc or Date::Manip. It sometimes takes a little bit more work on your end to use Time::Local for date calculations. But I don't think you will find a pre-packaged solution for your problem anywhere, so elbow grease can't be avoided here. In this case, just subtracting days is a piece of cake for timelocal_nocheck. Rarely do I ever need to do computations that are whacked-out enough (or thousands of years away from the present) to require the heavy-duty modules.

    blokhead

Re: Week Algorithm
by fglock (Vicar) on Aug 25, 2003 at 16:38 UTC

    Tested under WinNT:

    use strict; use Date::Tie; tie my %dt, 'Date::Tie'; my @a; for (1..7) { push @a, join('-', $dt{year}, $dt{month}, $dt{day}); $dt{day}--; } my $re = join('|', @a); for ( qw( 2003-08-25 2003-08-20 2003-08-27 2003-08-15 ) ) { print "$_ is "; print "not " unless /$re/; print "in last 6 days\n"; } # 2003-08-25 is in last 6 days # 2003-08-20 is in last 6 days # 2003-08-27 is not in last 6 days # 2003-08-15 is not in last 6 days

    update: "full version":

    use strict; use Date::Tie; tie my %dt, 'Date::Tie'; my @a; for (1..7) { push @a, join('-', ( $dt{day} < 10 ? "(?:0)?".(0+$dt{day}) : $dt{day} ), ( $dt{month} < 10 ? "(?:0)?".(0+$dt{month}) : $dt{month} ), "(?:".substr($dt{year}, 0, 2).")?".substr($dt{year}, 2, 2) , ); $dt{day}--; } my $re = join('|', @a); print "RE: $re\n"; for ( qw( 25-08-2003 25-8-03 20-08-03 27-08-2003 15-8-03 ) ) { print "$_ is "; print "not " unless /$re/; print "in last 6 days\n"; } # 25-08-2003 is in last 6 days # 25-8-03 is in last 6 days # 20-08-03 is in last 6 days # 27-08-2003 is not in last 6 days # 15-8-03 is not in last 6 days