chrisjej has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if anyone has a good algorithm to work out average start time that handles times over midnight. Assume the job runs once every 24 hours and will usually start within a 6 hour window.

Obviously you could just sum seconds since midnight / number entries, which would work well if your times were:

11:00 and 13:00 where it would give the plausible 12:00

However...

23:00 and 01:00 would also give the answer 12:00 whereas 00:00 is desirable.

If, instead, you calculated this second example on seconds since 12:00 - you would get the desired answer of 00:00. But then the first example would also give you 00:00.

I'm thinking you could do it by doing a first pass to generate a histogram and then derive a good base time from that.

But I was hoping someone might already have implemented or know of a solution.

Replies are listed 'Best First'.
Re: Average start time handling midnight
by perldigious (Priest) on Jul 21, 2016 at 12:55 UTC

    Hi chrisjej,

    See if the following helps. http://mathforum.org/library/drmath/view/63173.html

    Doing a simple "floating point hours since midnight" conversion like he shows there may be a simple way to accomplish your goal.

    EDIT:

    As hippo pointed out, a better defined spec may help the Monks help you out. I tried the following that seems to work (though I haven't done extensive testing by any means), but it presumes that you are always only averaging two times (like you show), and that they are always in chronological order with the earliest time first. Those are some pretty big assumptions.

    use strict; use warnings; my @times1 = qw( 11:00 13:00 ); my @times2 = qw( 23:00 1:00 ); my @times3 = qw( 10:30 13:00 ); my @times4 = qw( 19:40 1:20 ); print "times1 average: " . average_start_time(@times1) . "\n"; print "times2 average: " . average_start_time(@times2) . "\n"; print "times3 average: " . average_start_time(@times3) . "\n"; print "times4 average: " . average_start_time(@times4) . "\n"; sub average_start_time { my ($hours1, $minutes1) = split /:/, $_[0]; my ($hours2, $minutes2) = split /:/, $_[1]; my $decimal_hours1 = $hours1 + $minutes1/60; my $decimal_hours2 = $hours2 + $minutes2/60; my $average_decimal_hours = ($decimal_hours1+$decimal_hours2)/2; $average_decimal_hours -= 12 if ($decimal_hours1 > $decimal_hours2 +); $average_decimal_hours += 24 if ($average_decimal_hours < 0); return int($average_decimal_hours) . ":" . sprintf("%02d", ($avera +ge_decimal_hours - int($average_decimal_hours))*60); }

    I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
    I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
Re: Average start time handling midnight
by choroba (Cardinal) on Jul 21, 2016 at 14:11 UTC
    I'm not sure I fully understood your requirements, and whether I covered all the possibilities, but the following seems to give reasonable results to me:

    #!/usr/bin/perl use warnings; use strict; sub avg_time { my ($s1, $s2) = @_; my ($t1, $t2) = map 60 * (split /:/)[0] + (split /:/)[1], $s1, $s2 +; my $avg = ($t1 + $t2) / 2; # The times are closer to each other via midnight than via noon. $avg += 12 * 60 if abs($t1 - $t2) > 12 * 60; # Don't report hours > 23. $avg %= 24 * 60; return join ':', map sprintf('%02d', $_), int $avg / 60, $avg % 60 } use Test::More; is avg_time('11:00', '13:00'), '12:00'; is avg_time('23:00', '01:00'), '00:00'; is avg_time('10:59', '13:01'), '12:00'; is avg_time('22:59', '01:01'), '00:00'; is avg_time('03:00', '21:00'), '00:00'; is avg_time('10:20', '22:10'), '16:15'; is avg_time('10:10', '22:20'), '04:15'; done_testing();

    Unlike GotToBTru, I don't think 7:00 - 9:00 should give different results to 9:00 - 7:00 (the "next day" tests with both times AM or PM). The other tests mentioned there pass.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      I'm curious about the rationale for choosing to average the shorter of the two possible sequences? Being events in time, there is definitely an order to them.

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Average start time handling midnight
by salva (Canon) on Jul 21, 2016 at 15:32 UTC
    You can calculate the average and the variance for the 24h intervals centered around every hour (0, 1, 2,... 23) and pick the center which minimizes the variance:
    use POSIX qw(fmod); my @data = (1, 1, 1, 0, 1, 2, 3, 4, 21, 22, 20, 22, 23, 20, 22); sub circular_average { my ($pivot, $round, @data) = @_; my $sum = 0; my $sum2 = 0; my $displacement = 1.5 * $round - $pivot; for my $data (@data) { my $displaced = fmod($data + $displacement, $round); # printf "%2i->%3i,%2i ", $data, $displaced, $displaced - $dis +placement; $sum += $displaced; $sum2 += $displaced * $displaced; } # print "\n"; my $inv_n = 1.0 / @data; my $avg = fmod($inv_n * $sum + 0.5 * $round + $pivot, $round); wantarray ? ($avg, $inv_n * $sum2 - $inv_n * $inv_n * $sum * $sum) + : $avg; } my ($best_avg, $best_s2); for my $time (0..23) { my ($avg, $s2) = circular_average $time, 24, @data; if (not defined $best_s2 or $best_s2 > $s2) { $best_avg = $avg; $best_s2 = $s2; } } printf "avg: %.1f, s2: %.1f\n", $best_avg, $best_s2;
    Then, you can even iterate a few times centering the interval around the best average ($best_avg) found:
    for (0..5) { ($best_avg, $best_s2) = circular_average $best_avg, 24, @data; printf "avg: %.1f, s2: %.1f\n", $best_avg, $best_s2; }
Re: Average start time handling midnight
by kcott (Archbishop) on Jul 21, 2016 at 16:51 UTC

    G'day chrisjej,

    Here's another way to do it using the builtin modules Time::Piece and Time::Seconds.

    I've shown averages truncated to whole minutes (as per your OP) and also averages that haven't been truncated (i.e. shows seconds in the output).

    #!/usr/bin/env perl use strict; use warnings; use Time::Piece; use Time::Seconds; use Test::More; my @ranges = ( ['11:00', '13:00', '12:00', '12:00:00'], ['23:00', '01:00', '00:00', '00:00:00'], ['23:30', '00:30', '00:00', '00:00:00'], ['12:00', '12:00', '12:00', '12:00:00'], ['00:00', '00:00', '00:00', '00:00:00'], ['00:00', '00:02', '00:01', '00:01:00'], ['00:00', '00:01', '00:00', '00:00:30'], ['23:58', '00:00', '23:59', '23:59:00'], ['23:59', '00:00', '23:59', '23:59:30'], ); plan tests => @ranges * 2; test_average($_) for @ranges; sub test_average { my ($range) = @_; my $t0 = get_tp($range->[0]); my $t1 = get_tp($range->[1]); $t1 += ONE_DAY if $t1 < $t0; my $avg = ($t1->epoch + $t0->epoch) / 2; ok( Time::Piece->strptime($avg, '%s')->strftime('%H:%M') eq $range +->[2], "$range->[0] -> $range->[1]: Avg = $range->[2] [HH:MM]" ); ok( Time::Piece->strptime($avg, '%s')->strftime('%H:%M:%S') eq $ra +nge->[3], "$range->[0] -> $range->[1]: Avg = $range->[3] [HH:MM:SS]" ); } sub get_tp { Time::Piece->strptime(shift, '%H:%M') }

    Output:

    1..18 ok 1 - 11:00 -> 13:00: Avg = 12:00 [HH:MM] ok 2 - 11:00 -> 13:00: Avg = 12:00:00 [HH:MM:SS] ok 3 - 23:00 -> 01:00: Avg = 00:00 [HH:MM] ok 4 - 23:00 -> 01:00: Avg = 00:00:00 [HH:MM:SS] ok 5 - 23:30 -> 00:30: Avg = 00:00 [HH:MM] ok 6 - 23:30 -> 00:30: Avg = 00:00:00 [HH:MM:SS] ok 7 - 12:00 -> 12:00: Avg = 12:00 [HH:MM] ok 8 - 12:00 -> 12:00: Avg = 12:00:00 [HH:MM:SS] ok 9 - 00:00 -> 00:00: Avg = 00:00 [HH:MM] ok 10 - 00:00 -> 00:00: Avg = 00:00:00 [HH:MM:SS] ok 11 - 00:00 -> 00:02: Avg = 00:01 [HH:MM] ok 12 - 00:00 -> 00:02: Avg = 00:01:00 [HH:MM:SS] ok 13 - 00:00 -> 00:01: Avg = 00:00 [HH:MM] ok 14 - 00:00 -> 00:01: Avg = 00:00:30 [HH:MM:SS] ok 15 - 23:58 -> 00:00: Avg = 23:59 [HH:MM] ok 16 - 23:58 -> 00:00: Avg = 23:59:00 [HH:MM:SS] ok 17 - 23:59 -> 00:00: Avg = 23:59 [HH:MM] ok 18 - 23:59 -> 00:00: Avg = 23:59:30 [HH:MM:SS]

    — Ken

Re: Average start time handling midnight
by BillKSmith (Monsignor) on Jul 21, 2016 at 15:21 UTC
    There does not seem to be a general solution to this problem. You did not mention how many reading you must average. A 'solution' which works well for two readings may fail for three or more. The article https://en.wikipedia.org/wiki/Mean_of_circular_quantities suggests an algorithm that seems to always do what we want. Too bad it is so complex.
    Bill

      Maybe I'm being too simplistic, but given the OP's restrictions, "usually start within a 6 hour window" and "handles times over midnight", I think it's doable. If the 6 hours ended just after midnight, you would need to handle times from 18:01 - 00:01; if it started just before midnight, you would need to handle 23:59 - 5:59. For simplicity's sake, I'd make my window 18:00 - 5:59. Thus,

      $n = 0; while ( ... ) { $hour += 24 if( $hour < 6 ); next if( $hour < 18 ); # ignore times outside the window $sum += 3600*$hour + 60*$min + $sec; $avg = $sum / ++$n; }

      For the $hour < 18 rule: if the time is before 6am, it's $hour would have already been adjusted to somewhere in range 24 to 29, so wouldn't trigger the <18 condition; if the time was after 6pm, it wouldn't trigger the condition; else, it will trigger the condition, and the time would be ignored.

      If OP wants a narrower or wider 'accept-range', just adjust the two comparisons appropriately.

      Other options would be to clamp times between 6am and noon to 05:59:59, and between noon and 6pm to 18:00:00, which will not ignore points, but manipulate them to fall within the "normal window".

      I understand there are issues in the general case, but with the restrictions given, I think things are well-defined enough, and this algorithm should give a mean closer to midnight than to noon, which seems to be what the OP wants.

      Not that complicated, really, although it does seem a bit silly to do trigonometry where simple arithmetic would suffice.

      #! /usr/bin/perl -wl my @times = qw( 17:00 19:00 11:00 13:00 23:00 01:00 10:30 13:00 19:40 01:20 16:00 02:00 ); # keep angles in range -pi .. pi sub _PI () { 2 * atan2(1, 0) } sub _TC () { 24 * 60 / (2 * _PI) } sub time2angle { map { my ($h, $m) = split /:/; _PI - (60 * $h + $m) / _TC } @_ } sub angle2time { map { my $m = int _TC * (_PI - $_); sprintf "%02d:%02d", $m / 60, $m % 60 } @_ } use List::Util qw( sum ); sub circ_avg { atan2 sum(map sin, @_), sum(map cos, @_) } for (; @times > 1; shift @times) { my @t = @times[0, 1]; print "@t => @{[angle2time circ_avg time2angle @t]}"; }

      Thank you for the wikipedia link.

        When averaging two times, yes, it's silly. But when averaging more than two, it comes out with a very different result.

        ... # your code thru the definition of sub circ_avg; print "@times"; print " => circle vs line"; print " => @{[angle2time circ_avg time2angle @times]} vs @{ +[angle2time( sum( time2angle @times ) / @times ) ]}"; for (; @times > 1; shift @times) { my @t = @times[0, 1]; print "@t => @{[angle2time circ_avg time2angle @t]} vs @{[angle2ti +me( sum( time2angle @t ) / @t ) ]}"; } __END__ __OUTPUT__ 17:00 19:00 11:00 13:00 23:00 01:00 10:30 13:00 19:40 01:20 16:00 02:0 +0 => circle vs line => 17:46 vs 12:12 17:00 19:00 => 18:00 vs 18:00 19:00 11:00 => 15:00 vs 15:00 11:00 13:00 => 12:00 vs 12:00 13:00 23:00 => 18:00 vs 18:00 23:00 01:00 => 00:00 vs 12:00 01:00 10:30 => 05:45 vs 05:45 10:30 13:00 => 11:45 vs 11:45 13:00 19:40 => 16:20 vs 16:20 19:40 01:20 => 22:30 vs 10:30 01:20 16:00 => 20:40 vs 08:40 16:00 02:00 => 21:00 vs 09:00

        The pairs of times come out to a simple average that mostly matches the circles (except on the ones with midnight between time1 and time2); but you get very different results between the two averages for the entire list. Previously, I had compared my own implementation of the angular average (yours is better) to the results of salva's solution of finding the center-with-minimum-variance and found that the results for salva's example list (or a list of random times) gave very similar means to the angular average, which would be very different from the arithmetic mean for the same set of data

      Perhaps equivalent:

      Convert the time to radians. Treat all times as unit vectors in polar coordinates, with the time_in_radians as the angle. Convert the polar (1, time_in_radians) to Cartesion (x,y). Sum all of the xs and ys separately, (optionally averaging them, again separately), and convert the result back to polar. The average time is the angle. The averaged magnitude might be interesting also.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Average start time handling midnight
by SimonPratt (Friar) on Jul 21, 2016 at 14:54 UTC

    An obvious approach is to convert times to seconds, find the midpoint, adjust and return, like so:

    use 5.16.2; use POSIX 'strftime'; my @test = (['11:00', '13:00'], ['23:00', '01:00'], ['10:30', '13:00'] +, ['19:40', '01:20']); foreach my $tst (@test) { print "Testing '$tst->[0]', '$tst->[1]': "; say getmid($tst->[0], $tst->[1]); } sub getmid($$) { my ($start, $end) = @_; $start =~ /^(\d{2}):(\d{2})$/ or die "invalid start date provided: + '$start', must be in format hh:mm\n"; my $startseconds = $1 * 3600 + $2 *60; $end =~ /^(\d{2}):(\d{2})$/ or die "invalid end date provided: '$e +nd', must be in format hh:mm\n"; my $endseconds = $1 * 3600 + $2 * 60; my $range; if ($endseconds < $startseconds) { $range = 86400 - $startseconds ++ $endseconds } else { $range = $endseconds - $startse +conds } my $average = $startseconds + int($range / 2); return strftime("%H:%M", gmtime($average)); }

Re: Average start time handling midnight
by hippo (Archbishop) on Jul 21, 2016 at 12:34 UTC

    Subtract 24h from all times which are >12:00. Then calculate the mean, which is an offset from midnight. If the mean is < 0, add 24h back onto it.

      Your algorithm has trouble with the first example, 11a and 1pm. Average of 11 and -11 is 0.

      Update (and updated again - no need to test for negative average (and updated again to allow minutes along with hours)):

      use strict; use warnings; use Test::Simple tests => 8; ok(&avg('01:05','03:13') eq '02:09', 'AM Only'); ok(&avg('20:43','22:45') eq '21:44', 'PM Only'); ok(&avg('09:00','13:00') eq '11:00', 'AM to PM'); ok(&avg('15:12','01:52') eq '20:32', 'PM to AM next day'); ok(&avg('09:10','07:08') eq '20:09', 'AM to AM next day'); ok(&avg('15:02','13:30') eq '02:16', 'PM to PM next day'); ok(&avg('11:00','13:00') eq '12:00', 'OP Example 1'); ok(&avg('23:00','01:00') eq '00:00', 'OP Example 2'); sub avg { my ($x,$y) = @_; $x = ttoi($x); $y = ttoi($y); if ($y < $x) { $y += (24 * 60); } return itot((($x + $y)/2) % (24 * 60)); } sub ttoi { my $t = shift; my ($h,$m) = split /:/,$t; return $h * 60 + $m; } sub itot { my $i=shift; my $h = $i / 60; my $m = $i % 60; return sprintf "%02d:%02d",$h,$m; }
      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

        I had rather taken that first example to be an illustration of what chrisjej didn't want. Perhaps we need a better-defined spec?

Re: Average start time handling midnight
by gandolf989 (Scribe) on Jul 21, 2016 at 19:08 UTC
    There are date functions that will just give you the elapsed time. They should even be able to account for different timezones. http://search.cpan.org/~drolsky/DateTime1.34/lib/DateTime.pm#Datetime_Subtraction Another option is to convert the dates to unix time and subtract, then figure out how many hours, minutes and seconds elapsed. This is a great place to just use a library rather than build something new.

      It would indeed be a trivial exercise if the date information were available as well as the time of day. However, there's no indication at all in the OP that the date information is present in the data set and that is why it has garnered responses which are as varied as they are many.

      BTW, the URL you included gives a 404, perhaps you meant http://search.cpan.org/~drolsky/DateTime-1.34/lib/DateTime.pm#Datetime_Subtraction?