gulden has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I need your help to find the best algorithm for my task. The task is very simple, read lines form a file and check lines which the "Interval Date" contains a set of Hours (The Interval Date won't be greater than 24 Hours).

The file is in the form:

Start_Time | End Time | TEXT 2009-07-22 08:00:00|2009-07-22 08:00:00|blablalblabla 2009-07-22 01:00:00|2009-07-22 01:00:00|blablalblabla 2009-07-22 08:00:00|2009-07-22 21:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 00:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 02:00:00|blablalblabla
The "pseudo code" should be something like that:
my @hours (1, 11) ; # Hours to check open (FILE,"file.txt") or die "$!;"; <FILE>; # Skip first line while (<FILE>){ # Correction after [ack] comment chomp; my ($start_date,$stop_date,$text) = split '|'; print "\nInterval $start_date - $stop_date\n"; foreach my $hour(@hours){ if( ($start_date to $stop_date) contains $hour){ # pseudo code print "Hour $hour: Match"; }else{ print "Hour $hour:Not Match"; } } close(FILE);
The output for the sample file should be (corrected after graff comments):
Interval 2009-07-22 08:00:00 - 2009-07-22 08:00:00 Hour 1 :Not Match Hour 11:Not Match Interval 2009-07-22 01:00:00 - 2009-07-22 01:00:00 Hour 1 :Match Hour 11:Not Match Interval 2009-07-22 08:00:00 - 2009-07-22 21:00:00 Hour 1 :Not Match Hour 11:Match Interval 2009-07-22 23:00:00 - 2009-07-23 00:00:00 Hour 1 :Not Match Hour 11:Not Match Interval 2009-07-22 23:00:00 - 2009-07-23 02:00:00 Hour 1 :Match Hour 11:Not Match

Any help will be helpful.


«A contentious debate is always associated with a lack of valid arguments.»

Replies are listed 'Best First'.
Re: Check if Date interval contains Hour X
by ikegami (Patriarch) on Jul 23, 2009 at 18:58 UTC
    use strict; use warnings; use DateTime qw( ); use DateTime::Format::Strptime qw( ); # Returns true if range ($dt_s, $dt_e) # spans any part of any of the @hours. sub hour_in_range { my ($dt_s, $dt_e, @hours) = @_; # Find the floor. ( $dt_s = $dt_s->clone() ) ->truncate( to => 'hour' ); for my $hour (@hours) { ( my $dt = $dt_s->clone() ) ->set_hour($hour); $dt->add( days => 1 ) if $dt < $dt_s; return 1 if $dt <= $dt_e; } return 0; } { my @hours = (1, 11); # Hours to check my $parser = new DateTime::Format::Strptime( pattern => '%Y-%m-%d %H:%M:%S', time_zone => 'local', ); <DATA>; # Skip first line while (<DATA>) { chomp; my ($ts_s, $ts_e, $text) = split /\|/; my $dt_s = $parser->parse_datetime( $ts_s ); my $dt_e = $parser->parse_datetime( $ts_e ); if (hour_in_range($dt_s, $dt_e, @hours)) { print "Match\n"; } else { print "Not Match\n"; } } } __DATA__ Start_Time |End Time |TEXT 2009-07-22 08:00:00|2009-07-22 08:00:00|blablalblabla 2009-07-22 01:00:00|2009-07-22 01:00:00|blablalblabla 2009-07-22 08:00:00|2009-07-22 21:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 00:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 02:00:00|blablalblabla

      Another (maybe simpler) way of truncating:

      sub hour_in_range { my ($dt_s, $dt_e, @hours) = @_; for my $hour (@hours) { my $dt = $dt_s->clone() ->truncate( to => 'day' ) ->add( hours => $hour ); $dt->add( days => 1 ) if $dt < $dt_s; return 1 if $dt <= $dt_e; } return 0; }
        That introduced a bug. hour_in_range no longer performs as documented. Test case:
        2010-10-10 01:59:59|2010-10-10 01:59:59|blablalblabla
Re: Check if Date interval contains Hour X
by alexm (Chaplain) on Jul 23, 2009 at 18:42 UTC
      I thought you were on to something until I realized the real problem is finding the right date to go with the hour. Using ::Span makes things a lot more complicated since you now need to make 4 DateTime objects for every hour.

        The very first thing that came to my mind after reading the OP was the method DateTime::Span::contains. However, yours is the right approach since that method only works for sets that are fully inside, as the manual says.

        Thanks!

Re: Check if Date interval contains Hour X
by graff (Chancellor) on Jul 24, 2009 at 03:31 UTC
    Um... according to your pseudocode, there should be 10 lines of output, because you seem to want to output one line for each of your two "hour" values tested against each of the 5 lines of input data. And in that regard, wouldn't it be helpful for the output to mention which of the "hour" values was being tested each time, and what time span it was being tested against?

    As for the overall approach, I'm with ikegami (with some minor alterations): in order to make this workable in a general way, what you really want is a subroutine that takes three args: a targeted hour, and the start and end date/time values to test. A little sanity checking on the data would be worthwhile as well, and in case it counts for anything, there are some short-cuts you can take advantage of...

    #!/usr/bin/env perl use strict; use Date::Calc qw/Date_to_Time/; sub hour_in_span { my ( $hr, $bgn, $end ) = @_; return unless ( $hr =~ /^\d{1,2}$/ and $hr >= 0 and $hr <= 23 ); my $hr2 = sprintf( "%02d", $hr ); # make sure to use 2 digits for ( $bgn, $end ) { return unless ( /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$/ ); } return unless $bgn le $end; # require that args be in correct seq +uence # easiest case: start == end == hour of interest if ( $bgn eq $end ) { return ( $bgn =~ / $hr2:/ ); } # next easiest: span from start to end >= one full day # so it must include hour of interest my $ep_bgn = Date_to_Time( split /\D/, $bgn ); my $ep_end = Date_to_Time( split /\D/, $end ); return 1 if (( $ep_end - $ep_bgn ) / ( 24 * 60 * 60 ) >= 1 ); # hardest case: # -- plug hour of interest into each endpoint of the span # and see if either resulting time stamp falls within the span ( my $test_bgn = $bgn ) =~ s/ \d{2}:/ $hr2:/; ( my $test_end = $end ) =~ s/ \d{2}:/ $hr2:/; my $test_ep_bgn = Date_to_Time( split /\D/, $test_bgn ); my $test_ep_end = Date_to_Time( split /\D/, $test_end ); return (( $ep_bgn <= $test_ep_bgn and $ep_end >= $test_ep_bgn ) or ( $ep_bgn <= $test_ep_end and $ep_end >= $test_ep_end )); } ## End of algorithm ## -- from here on down, we're just testing it my @hours = ( 1, 11 ); while (<DATA>) { next unless ( /^\d{4}-/ ); my ( $bgn, $end ) = split /\|/; for my $hr ( @hours ) { my $in = hour_in_span( $hr, $bgn, $end ); if ( !defined( $in )) { print "$bgn -- $end / $hr : bad data\n"; } elsif ( $in ) { print "$bgn -- $end / contains $hr\n"; } else { print "$bgn -- $end / does NOT contain $hr\n"; } } } __DATA__ Start_Time | End Time | TEXT 2009-07-22 08:00:00|2009-07-22 08:00:00|blablalblabla 2009-07-22 01:00:00|2009-07-22 01:00:00|blablalblabla 2009-07-22 08:00:00|2009-07-22 21:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 00:00:00|blablalblabla 2009-07-22 23:00:00|2009-07-23 02:00:00|blablalblabla

    (BTW: you could relax the constraint on having the subroutine args in a specified order. So long as the "targeted hour" arg is in the right place, the other two args are interchangeable; just use the lower value as $bgn and the higher one as $end.)

      «Um... according to your pseudocode, there should be 10 lines of output, because you seem to want to output one line for each of your two "hour" values tested against each of the 5 lines of input data. And in that regard, wouldn't it be helpful for the output to mention which of the "hour" values was being tested each time, and what time span it was being tested against?»
      u r right.
Re: Check if Date interval contains Hour X
by ack (Deacon) on Jul 24, 2009 at 04:46 UTC

    I'm curious, you open the file to filehandle FILE but then, after you've skipped the first line, you read from filehandle LINE in your while (<LINE>) block.

    Is that a typo? Should the while statement be reading from FILE?

    ack Albuquerque, NM
Re: Check if Date interval contains Hour X
by gulden (Monk) on Jul 24, 2009 at 10:08 UTC
    Thank u all, for the excellent code/comments that were made.