Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Finding gaps in date ranges

by Ovid (Cardinal)
on May 10, 2001 at 23:52 UTC ( [id://79539]=perlquestion: print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I've searched through CPAN for an appropriate module, but to no avail. Assuming that I have a hash whose keys are in YYYY-MM format, I need to find the gaps in the date ranges and return those gaps. I've written some code that appears to work, but it seems like overkill. Boundary conditions are a nightmare and the logic gets a bit funky in places.

I'm posting the code, in case an alternative doesn't exist, but I'm hoping to find a simpler solution. I think my solution is as simple as I can reasonably make it, but a module that's been tested more robustly would be much appreciated.

Side note: &find_gaps should return a reference to an array containing all dates that should be there, but aren't.

use strict; use warnings; use Data::Dumper; my @temp = ( [ qw( testthis 2000-4 2000-2 2000-1 2000-12 2001-1 2002-1 +2 1999-12 ) ], # bad [ qw( testthis 1999-1 1999-2 1999-3 1999-4 1999-5 1999-6 +1999-7 1999-8 1999-9 1999-10 1999-11 1999-12 2001-3 ) ], # bad [ qw( 1999-1 1999-2 1999-3 1999-4 1999-5 1999-6 1999-7 19 +99-8 1999-9 1999-10 1999-11 1999-12 2000-1 ) ], # good [ qw( 1998-12 1999-1 ) ], # good [ qw( 2004-1 ) ], # good [ qw( 2004-12 ) ], # good [ qw( 2004-12 2004-11 ) ], # good [ qw( testthis 2004-12 2004-9 2004-11 ) ] ); # bad foreach ( @temp ) { my %dates = map { $_, '' } @$_; if ( exists $dates{ 'testthis' } ) { delete $dates{ 'testthis' }; print Dumper( \%dates ); my ( $missing ) = find_gaps( \%dates ); print Dumper( $missing ), '-' x 20 ,"\n"; } } # # Above is my test code. Below is the code in question # sub find_gaps { my $dates = shift; my %year; # Create hash with years as keys and values being a list with # first month and last month in that year range foreach my $date ( keys %$dates ) { my $this_year = substr $date, 0, 4; next if exists $year{ $this_year }; $year{ $this_year } = get_range( $this_year, \%$dates ); } my @years = sort keys %year; my $gap = 0; my ( @missing, @present ); # Year span is greater than number of years if ( scalar @years > 1 and ( ( $years[ -1 ] - $years[ 0 ] ) ) != ( + scalar @years ) - 1 ) { for my $year ( $years[ 0 ] .. $years[ -1 ] ) { if ( ! grep { /$year/ } @years ) { for ( 1 .. 12 ) { push @missing, "$year-$_"; } # next month } } # next year } # Gap in months? for my $year_index ( 0 .. $#years ) { my $curr_year = $years[ $year_index ]; my $first_month = $year{ $curr_year }->[ 0 ]; my $last_month = $year{ $curr_year }->[ 1 ]; next if $year_index == 0 and $first_month == 12; next if $year_index == $#years and $first_month == 1 and ! $la +st_month; my ( $first, $last ); # first year, but more than one year if ( $year_index == 0 and $#years != 0 ) { ( $first, $last ) = ( $first_month, 12 ); # first year # first and only year } elsif ( $year_index == 0 ) { ( $first, $last ) = ( $first_month, $last_month ? $last_mo +nth : $first_month ); #last year } elsif ( $year_index == $#years ) { ( $first, $last ) = ( 1, $last_month ? $last_month : $firs +t_month ); # last year # intervening years } else { ( $first, $last ) = ( 1, 12 ); } for my $month ( $first .. $last ) { if ( ! exists $dates->{ $curr_year."-".$month } ) { push @missing, $curr_year."-".$month; } } # next $month } # next $year_index return \@missing; } sub get_range { my ( $year, $dates ) = @_; my @months; # If the %date has the same year, grab the month @months = sort { $a <=> $b } map { substr( $_, 5 ) } grep { substr( $_, 0, 4 ) == $year } keys %$dates; # Return range if more than one month, else return the month $#months > 0 ? [ $months[ 0 ], $months[ -1 ] ] : [ $months[ 0 ], 0 + ]; }

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Finding gaps in date ranges
by MeowChow (Vicar) on May 11, 2001 at 00:51 UTC
    I would approach this problem with a conversion, from date format, to a more easily dealt-with numeric format, and vice-versa:
    sub find_gaps { my ($pdate, @dates) = sort {$a <=> $b} map { my ($y,$m) = split /-/; $y*12 + $m - 1 +} keys %{+pop}; my @skips; foreach (@dates) { push @skips, $pdate+1 .. $_-1; $pdate = $_; } [ map { int ($_/12) . "-" . (1 + $_%12) } @skips ]; }
    UPDATE: couldn't help golfing things a bit...
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
Re (tilly) 1: Finding gaps in date ranges
by tilly (Archbishop) on May 11, 2001 at 00:43 UTC
    An old tactic when faced with a representation that has a lot of boundary cases. Convert to and from a representation that is easier to work with, like so:
    use strict; use Carp; # Takes a list of months in yyyymm format, returns the gaps in sorted +order. sub find_gaps { my @dates = sort {$a <=> $b} map {yyyy_mm2mon_num($_)} @_; my $last_mon_count = shift @dates; my @gaps; foreach my $mon_count (@dates) { foreach my $missed (($last_mon_count + 1)..($mon_count-1)) { push @gaps, mon_num2yyyy_mm($missed); } $last_mon_count = $mon_count; } return @gaps; } # Converts a count of months back into yyyymm format sub mon_num2yyyy_mm { my $mon_num = shift; use integer; my $yyyy = $mon_num / 12; my $mm = 1 + $mon_num % 12; # Sanity check unless (1900 < $yyyy and $yyyy < 2200) { carp("Month number '$mon_num' gives a year of $yyyy which seems st +range"); } return "$yyyy-$mm"; } # Converts yyyymm format into a count of months from year 0. (Which d +idn't # exist, shoot me. It is a convenient base-point for calculations.) sub yyyy_mm2mon_num { my $date = shift; if ($date =~ /(\d{4}).*?(\d\d?)/) { return 12*$1 + $2 - 1; } else { confess("Date '$date' is not in yyyymm format?"); } }
      The reference to @_ in find_gaps should actually be keys %{ $_[0] }. From there, I ran it with the following dates (first line of dates from my test suite):
      2000-4 2000-2 2000-1 2000-12 2001-1 2002-12 1999-12
      The output:
      $VAR1 = '2000-3';
      It found the first gap but not the rest.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        Ick, that API looks ugly. For the sake of having a clean API I would stick with my current function and call keys on the way into the function, not inside of it.

        That said, you must have changed more than one thing in your test run. When I run it, here is what I get:

        # The test code: print map "$_\n", find_gaps( qw(2000-4 2000-2 2000-1 2000-12 2001-1 2002-12 1999-12) ); # The output 2000-3 2000-5 2000-6 2000-7 2000-8 2000-9 2000-10 2000-11 2001-2 2001-3 2001-4 2001-5 2001-6 2001-7 2001-8 2001-9 2001-10 2001-11 2001-12 2002-1 2002-2 2002-3 2002-4 2002-5 2002-6 2002-7 2002-8 2002-9 2002-10 2002-11
Re: Finding gaps in date ranges
by Anonymous Monk on May 11, 2001 at 01:11 UTC

    I'm a big fan of peanos axioms. So I started with the idea: if it weren't dates we were dealing with, but natural numbers, then I would just sort the numbers, and compute the gaps with one simple loop.

    I decided to convert you date-format to something that can be sorted numerically (YYYYMM, with month in two digits, like "02"), and I made up a function "nextmonth" for incrementing the date.

    sub nextmonth { my($year, $month); $year = substr($_[0], 0, 4); $month = substr($_[0], 4, 2); $month++; while ($month > 12) { $year++; $month-=12; } return $year . sprintf("%02d", $month); } sub find_gaps { my $dates = shift; my(@missing, @keys, $last); # convert your date-format to mine @keys = map { s/-(.)$/0$1/; s/-//; $_ } keys %$dates; # my format can be sorted numerically @keys = sort @keys; # in the loop we need two variables: # $last for the date before the current one, # and $date for the current date $last = shift(@keys); foreach my $date ( @keys ) { while (1) { $last = nextmonth($last); last if $last eq $date; push(@missing, $last); } $last = $date; } # convert my date-format back to yours @missing = map { s/0(.)$/$1/; s/^(....)/$1-/; $_ } @missing; return \@missing; }

    share and enjoy

      boohoo, the above article is mine, but I posted it anonymously by mistake. oh higher powers, if it please you, transfer the XPs of the article to this humble scribe.

      I'm a big fan of peanos axioms. So I started with the idea: if it weren't dates we were dealing with, but natural numbers, then I would just sort the numbers, and compute the gaps with one simple loop. I decided to convert you date-format to something that can be sorted numerically (YYYYMM, with month in two digits, like "02"), and I made up a function "nextmonth" for incrementing the date.

      sub nextmonth { my($year, $month); $year = substr($_[0], 0, 4); $month = substr($_[0], 4, 2); $month++; while ($month > 12) { $year++; $month-=12; } return $year . sprintf("%02d", $month); } sub find_gaps { my $dates = shift; my(@missing, @keys, $last); # convert your date-format to mine @keys = map { s/-(.)$/0$1/; s/-//; $_ } keys %$dates; # my format can be sorted numerically @keys = sort @keys; # in the loop we need two variables: # $last for the date before the current one, # and $date for the current date $last = shift(@keys); foreach my $date ( @keys ) { while (1) { $last = nextmonth($last); last if $last eq $date; push(@missing, $last); } $last = $date; } # convert my date-format back to yours @missing = map { s/0(.)$/$1/; s/^(....)/$1-/; $_ } @missing; return \@missing; }

      share and enjoy

      --
      Brigitte    'I never met a chocolate I didnt like'    Jellinek
      http://www.horus.com/~bjelli/         http://perlwelt.horus.at
Re: Finding gaps in date ranges
by cLive ;-) (Prior) on May 11, 2001 at 00:52 UTC

    This sort of thing?

    #!/usr/bin/perl use strict; my @temp = qw(2000-4 2000-2 2000-1 2000-12 2001-1 2002-12 1999-12); my @dates = (sort { my @a_date = split '-', $a; my @b_date = split '-', $b; $a_date[0] <=> $b_date[0] || $a_date[1] <=> $b_date[1]; } @temp); my @start_date = split '-', shift @dates; my @end_date = split '-', pop @dates; # use mod 12 to make job easy my $start_count = $start_date[0]*12 + $start_date[1]-1; my $end_count = $end_date[0]*12 + $end_date[1]-1; # hash of dates in this time my %date_hash; # build hash for ($start_count..$end_count) { # here's the sneaky trick :) my $year = int($_/12); my $month = $_%12 + 1; $date_hash{$year}{$month}++; } # now delete elements that exist for (@temp) { /(\d+)-(\d+)/; delete $date_hash{$1}{$2}; } # print missing print "Missing dates are\n\n"; my $count=1; foreach my $year (sort {$a <=> $b} keys %date_hash) { for (sort {$a <=> $b} keys %{$date_hash{$year}}) { printf("%4d-%2d ",$year,$_); print "\n" unless $count%4; $count++; } }
    cLive ;-)
Re: Finding gaps in date ranges
by suaveant (Parson) on May 11, 2001 at 00:30 UTC
    What about something like...
    my $error = 0; my @tmp = sort datesort qw( 2000-4 2000-2 2000-1 2000-12 2001-1 2002-1 +2 1999-12 ); my($year,$month) = split '-', $tmp[0]; for(my $i = 1; $i < @tmp; $i++) { if(++$month == 13) { $month = 1; $year++; } if($tmp[$i] ne "$year-$month") { print "ERROR! gap at '$year-$month', '$tmp[$i]' found!\n"; $error = 1; last; } } print "Success\n" unless $error; sub datesort { my($a1,$a2) = split '-', $a; my($b1,$b2) = split '-', $b; $a1 <=> $b1 or $a2 <=> $b2; }
    Tested on the bad and good arrays, you should be able to modify it for your needs easily enough...

    it would be shorter if your dates were zero padded.

    Update on afterthought I should say what it does... it sorts the date array (could be an array of hash keys), it then starts at the first, or earliest one, it goes through the array adding one to the month and wrapping the year and setting month to 1 if the month hits 13. it goes till it finds a date it didnt expect, or gets to the end of the array sucessfully. Pretty simple, needed the special sort since months were not zero padded.
                    - Ant

      The output from your program is:

      ERROR! gap at '2000-3', '2000-4' found!

      However, gaps exist at 2000-5 through 2000-11 and 2001-2 through 2002-11, yet were not reported. Did I misunderstand something about your code or how to use it?

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        my $error = 0; my @gaps; my @tmp = sort datesort qw( 2000-4 2000-12 2001-1 2002-12 1999-12 ); my($year,$month) = split '-', $tmp[0]; for(my $i = 1; $i < @tmp; $i++) { my $gap = 0; my @range; do { $range[1] = "$year-$month"; if(++$month == 13) { $month = 1; $year++; } if($tmp[$i] ne "$year-$month") { $range[0] = "$year-$month" unless $gap; $gap++; $error = 1; } if($gap > 100) { print "ERROR! infinite loop\n"; last; } } until($tmp[$i] eq "$year-$month"); if($gap) { push @range, "$year-$month"; push @gaps, \@range; } } if($error) { for(@gaps) { print "Gap from $_->[0] to $_->[1]\n"; } } print "Success\n" unless $error; sub datesort { my($a1,$a2) = split '-', $a; my($b1,$b2) = split '-', $b; $a1 <=> $b1 or $a2 <=> $b2; }
        This stores the ranges...
                        - Ant
        Yeah, just realized that... easy fix... one second...
                        - Ant
Re: Finding gaps in date ranges
by Sifmole (Chaplain) on May 11, 2001 at 01:05 UTC
    I guess I will throw mine in as well...
    Caveat, your response list will have 0 padded months, but that would be easy enough to correct.... Matter of fact I will, just give me a minute... :)
    UpdateRemoved 0 padding in response list.
    use strict; use Data::Dumper; my @temp = ( [ qw( testthis 2000-4 2000-2 2000-1 2000-12 2001-1 2002-1 +2 1999-12 ) ], # bad [ qw( testthis 1999-1 1999-2 1999-3 1999-4 1999-5 1999-6 +1999-7 1999-8 1999-9 1999-10 1999-11 1999-12 2001-3 ) ], # bad [ qw( 1999-1 1999-2 1999-3 1999-4 1999-5 1999-6 1999-7 19 +99-8 1999-9 1999-10 1999-11 1999-12 2000-1 ) ], # good [ qw( 1998-12 1999-1 ) ], # good [ qw( 2004-1 ) ], # good [ qw( 2004-12 ) ], # good [ qw( 2004-12 2004-11 ) ], # good [ qw( testthis 2004-12 2004-9 2004-11 ) ] ); # bad foreach ( @temp ) { my %dates = map { s/-(\d)$/-0$1/; $_, '' } @$_; if ( exists $dates{ 'testthis' } ) { delete $dates{ 'testthis' }; print Dumper( \%dates ); my ( $missing ) = find_gaps( \%dates ); print Dumper( $missing ), '-' x 20 ,"\n"; } } sub find_gaps { my $dates = shift; my @date_list = sort keys %$dates; my $c_date = shift @date_list; inc_date(\$c_date); my @gaps = map { trap_gap(\$c_date, $_); } @date_list; s/-0(\d)$/-$1/ for @gaps; return \@gaps; } sub inc_date { my $date = shift; my ($y, $d) = split('-', $$date); if ($d == 12) { $y++; $d = 01; } else { $d++; } $d = "0$d" if $d < 10; $$date = "${y}-${d}"; } sub trap_gap { my ($c, $t) = @_; my @res = (); if ($$c eq $t) { inc_date($c); return (); } while ($$c ne $t) { push(@res, $$c); inc_date($c); } inc_date($c); return @res; }
Re: Finding gaps in date ranges
by perlmonkey (Hermit) on May 11, 2001 at 05:47 UTC
    Instead of trying to find out what you are missing, it is a lot easier (and faster) to just exclude what you have.
    sub find_gaps { my $dates = shift; my @dates = sort{ $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map{[substr($_, 0, 4), substr($_, 5)]} keys %$dates; my( $minY,$minM,$maxY,$maxM ) = (@{$dates[0]}, @{$dates[-1]}); my @gaps; for my $y ($minY .. $maxY) { exists $dates->{"$y-$_"} ? next : push @gaps, "$y-$_" for ($minY == $y ? $minM : 1) .. ($maxY == $y ? $maxM : 12); } return \@gaps; }
    This code sorts to finds the min and max of the date range, then loops over the entire range in order looking to see if we have that date already. If we dont have the date then keep it to return.
(dws)Re: Finding gaps in date ranges
by dws (Chancellor) on May 11, 2001 at 02:54 UTC
    Here's a quick-and-dirty way to find gaps by counting the number of dates you know, then advancing from the first one, recording gaps until you've seen all the known dates there are to see. It assumes no leading zeros in months, and handles the edge cases of being handed an empty list, or a list with one date in it. Note the bruce-force determination of the initial date in the range.
    sub find_gaps { my $dates = shift; my $ndates = scalar keys %$dates; my @missing = (); return \@missing if $ndates == 0; # Determine the first date in the range my $year = substr((sort keys %$dates)[0],0,4); my $month = 1; while ( ! exists $$dates{ "$year-$month" } ) { do {$month = 1; ++$year} if ++$month > 12; } --$ndates; # Look for gaps until we've seen each date in the range while ( $ndates ) { do { $month = 1; ++$year } if ++$month > 12; if ( exists $$dates{"$year-$month"} ) { --$ndates; } else { push @missing, "$year-$month"; } } return \@missing; }
Re: Finding gaps in date ranges
by mr.nick (Chaplain) on May 11, 2001 at 08:02 UTC
    I call mine Abusing substr. The idea: convert it to zero padded month and then use substr to walk the date range from the first to the last, pushing onto the array ones you don't have. I hate having to convert it back to YYYY-M+ for testing.

    I am absolutely baffled by MeowChow solutions. What the heck does +pop do?

    #!/usr/bin/perl -w use strict; sub find_gaps { my $dates=shift; my @dates=sort map { sprintf "%04d-%02d",split/-/ } keys %$dates; my ($c,$l)=($dates[0],$dates[$#dates]); my @gaps; while ($l cmp $c) { my $r=sprintf("%d-%d",split(/-/,$c)); push @gaps,$r unless defined $dates->{$r}; substr($c,5,2)++; if (substr($c,5,2)==13) { substr($c,5,2)='01'; substr($c,0,4)++; } } \@gaps; } my %dates=map { $_,'' } qw ( 2000-4 2000-2 2000-1 2000-12 2001-1 2002- +12 1999-12 ); my $gaps=find_gaps \%dates; print "dates\n",join("\n",sort keys %dates),"\n"; print "gaps\n",join("\n",@$gaps),"\n";
      I am absolutely baffled by MeowChow solutions. What the heck does +pop do?
      Sorry {g} I should have put my golf clubs away :) The +pop simply removes the first argument from @_, which is the hash reference, and the %{...} around it dereference the hash ref. The + is there because the dereference by the enclosing %{...} is otherwise ambiguous. Without it, Perl would assume you are just referencing a hash variable named %pop, since the characters between the braces would all be [a-zA-Z0-9_ ]. Hmm... I hope that made some semblence of sense.
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
Re: Finding gaps in date ranges
by Anonymous Monk on May 11, 2001 at 19:31 UTC
    # assuming all dates are in %dates use Time::Local; my %missed = (); my ($mon, $yr) = split /-/, (sort keys %dates)[0]; my $from = timelocal(1, 0, 0, 1, $mon-1, $yr); my ($mon, $yr) = split /-/, (sort keys %dates)[-1]; my $to = timelocal(59, 59, 23, 1, $mon-1, $yr); while ($from < $to){ @date = localtime($from); $date[4]++; $date[5] += 1900; # well, you know... # add to %missed if not in %dates $missed{"$date[5]-$date[4]"}++ unless exists $dates{"$date[5]-$date[4]"}; $from += 60*60*24; # add one day } print "$_\n" for keys %missed;
    --perlplexer
Re: Finding gaps in date ranges
by birdbrane (Chaplain) on May 11, 2001 at 19:54 UTC
    How about Date::Calc? This allows you to input two different dates and get the difference. I used it for a "Baby Pool" for before my daughter was born (family members could bet on date, weight and gender. Here is a snippet of what I did:

    use Date::Calc qw(Delta_Days); $day_diff{$name} = abs(Delta_Days(2000, $actual_month, $ actual_day, 2000, $guess_month, $guess_day));

    From what I remember, the module was pretty flexible w/ how the date could be input.

    Side Note: It appears that there are disagreeing opinions about the module. Test it and see how it fits for you...

    bb

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://79539]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-03-29 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found