in reply to Multiple date ranges

Convert your dates to days since your membership started. Date::Manip is excellent at parsing dates in pretty much any known format (though the docs are little incomprehensible, its worth the effort persevering with them). Its function ParseDate() and Date_Calc() will do all the work for you.

The basic idea is to build up a string one char per day, '1' when they were a member and '0' when not. Then testing for membership on a particular day involves just calculating the offset for the day in question an using substr to test if there's a '1' there. For a range of days, calculate the offset and length, substr the string and use a simple m// to see if there is a '1' anywhere in the substring. That's is really.

If the length of the strings times the number of members, eg. 10years * 356 days * 4500 members = approx 16MB is considered too much then you could trade substr in for vec and use bits instead of bytes. A little more complex, but 2MB only.

I've heard that Date::Calc is almost as good at parsing strnge date formats as Date::Manip and runs much faster because its in C. I haven't had occasion t ry it.

A bit of sample code to demonstrate the idea. Kinda of works but has bugs. Might get you started though with the date stuff.

#! perl -slw use strict; use Data::Dumper; use Date::Manip; my $days_since_open = 365*3; # 3 years my $base_date = ParseDate( '1st January 2000' ); my %members; while(<DATA>) { my @stuff = split/\|/; my $member = shift@stuff; $members{$member}= '0' x ($days_since_open); # print $member; while (@stuff > 1) { my $date1 = ParseDate( shift@stuff ); my $start = Delta_Format( DateCalc( $base_date, $date1), 0, '% +dh' ); my $date2 = ParseDate( shift@stuff ); my $length = Delta_Format( DateCalc($date1, $date2), 0, '%dh') +; substr($members{$member}, $start, $length) = '1' x $length; } if (@stuff) { my $last = ParseDate( shift@stuff ); my $last_offset = DateCalc($base_date, $last ); my $current = Delta_Format( $last_offset, 0, '%dh'); substr($members{$member}, $current) = '1' x ($days_since_open +- $current); } } my $start = do{{ print 'Start date?: '; warn "Bad date, try again", redo unless ParseDate(scalar <STDIN>); }}; my $end = do{ print 'End date?: ';ParseDate(scalar <STDIN>); }; #print "$start, $end"; my $start_days = Delta_Format( DateCalc( $base_date, $start ), 0, '%dh +' ); my $end_days = Delta_Format( DateCalc( $base_date, $end ), 0, '%dh' ) +if $end; #print "$start_days, $end_days"; for my $member (keys %members) { print $member, ' was a member ', $end ? 'between ' : 'on ', $start, $end ? ' and '.$end : '', if substr($members{$member}, $start_days, $end_days||1) =~ m[1 +]; } =pod Output C:\test>232212 Start date?: 1st April 2000 End date?: ^Z Use of uninitialized value in string eq at e:/Perl/site/lib/Date/Manip +.pm line 4155, <STDIN> line 1. Fred was a member on 2000040100:00:00 Barney was a member on 2000040100:00:00 Wilma was a member on 2000040100:00:00 C:\test>232212 Start date?: 1st may 2000 End date?: 1st december 2002 Fred was a member between 2000050100:00:00 and 2002120100:00:00 Barney was a member between 2000050100:00:00 and 2002120100:00:00 Betty was a member between 2000050100:00:00 and 2002120100:00:00 Wilma was a member between 2000050100:00:00 and 2002120100:00:00 C:\test>232212 Start date?: 1st may 2000 End date?: 1st december 2001 Fred was a member between 2000050100:00:00 and 2001120100:00:00 Barney was a member between 2000050100:00:00 and 2001120100:00:00 Wilma was a member between 2000050100:00:00 and 2001120100:00:00 C:\test> =cut __DATA__ Fred| 2000-march 31 | 2000/april/7 | 19/jan/2002 | 24th february 2002 +| 16sep2002 Barney| 15 february 2000 | Feb 15th 2002 | 1 dec 2002 Wilma| 01apr2000| 2002/19sep Betty| 01DEC2002

Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: Re: Multiple date ranges
by AndyH (Sexton) on Feb 04, 2003 at 09:02 UTC

    Thanks for this. Nice oblique approach. One question: is there anything in your method that is going to be upset by the fact that the value of "BaseDate" is going to be way back in 1949? (I'm thinking Unix epoch stuff here, not just the number of days since then.)

      As far as I can tell, because Date::Manip uses it's own/ISO representation internally for dates, times and delta's, Gregorian -v- Julian -v- other calander systems aside, it should do the right thing for pretty much any set of dates that you want to throw at it. I haven't personnally verified this, but the module's been around long enough that any such fopars would probably have shown up by now.

      As for that code that I posted (besides being bug-ridden), it's just long strings of 1's and 0's, so there shouldn't be any problems as your only representing days. However it may not accurately account for leap days as you can't represent 0.2439 (from memory) days in a bit or byte. Meaning that if you need the kind of accuracy that would account for this, the method may not be useful to you. People joining or leaving on Feb 29th, or counts of active members on ranges that start or finish on Feb 29th could be mis-counted. I can't see an easy fix for this.

      The only other comment is that with a range of 50+ years and 4500+ members, the storage requirements rise to close to 80 MB. Moving to using a bit/day instead of byte/day this would reduce to 10MB which might be worth the effort, but vec isn't a direct substitute for substr unfortunately, so the code gets more complicated.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.