Re: Average start time handling midnight
by perldigious (Priest) on Jul 21, 2016 at 12:55 UTC
|
Hi chrisjej,
See if the following helps. http://mathforum.org/library/drmath/view/63173.html
Doing a simple "floating point hours since midnight" conversion like he shows there may be a simple way to accomplish your goal.
EDIT:
As hippo pointed out, a better defined spec may help the Monks help you out. I tried the following that seems to work (though I haven't done extensive testing by any means), but it presumes that you are always only averaging two times (like you show), and that they are always in chronological order with the earliest time first. Those are some pretty big assumptions.
use strict;
use warnings;
my @times1 = qw( 11:00 13:00 );
my @times2 = qw( 23:00 1:00 );
my @times3 = qw( 10:30 13:00 );
my @times4 = qw( 19:40 1:20 );
print "times1 average: " . average_start_time(@times1) . "\n";
print "times2 average: " . average_start_time(@times2) . "\n";
print "times3 average: " . average_start_time(@times3) . "\n";
print "times4 average: " . average_start_time(@times4) . "\n";
sub average_start_time
{
my ($hours1, $minutes1) = split /:/, $_[0];
my ($hours2, $minutes2) = split /:/, $_[1];
my $decimal_hours1 = $hours1 + $minutes1/60;
my $decimal_hours2 = $hours2 + $minutes2/60;
my $average_decimal_hours = ($decimal_hours1+$decimal_hours2)/2;
$average_decimal_hours -= 12 if ($decimal_hours1 > $decimal_hours2
+);
$average_decimal_hours += 24 if ($average_decimal_hours < 0);
return int($average_decimal_hours) . ":" . sprintf("%02d", ($avera
+ge_decimal_hours - int($average_decimal_hours))*60);
}
I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
| [reply] [d/l] |
Re: Average start time handling midnight
by choroba (Cardinal) on Jul 21, 2016 at 14:11 UTC
|
I'm not sure I fully understood your requirements, and whether I covered all the possibilities, but the following seems to give reasonable results to me:
#!/usr/bin/perl
use warnings;
use strict;
sub avg_time {
my ($s1, $s2) = @_;
my ($t1, $t2) = map 60 * (split /:/)[0] + (split /:/)[1], $s1, $s2
+;
my $avg = ($t1 + $t2) / 2;
# The times are closer to each other via midnight than via noon.
$avg += 12 * 60 if abs($t1 - $t2) > 12 * 60;
# Don't report hours > 23.
$avg %= 24 * 60;
return join ':', map sprintf('%02d', $_), int $avg / 60, $avg % 60
}
use Test::More;
is avg_time('11:00', '13:00'), '12:00';
is avg_time('23:00', '01:00'), '00:00';
is avg_time('10:59', '13:01'), '12:00';
is avg_time('22:59', '01:01'), '00:00';
is avg_time('03:00', '21:00'), '00:00';
is avg_time('10:20', '22:10'), '16:15';
is avg_time('10:10', '22:20'), '04:15';
done_testing();
Unlike GotToBTru, I don't think 7:00 - 9:00 should give different results to 9:00 - 7:00 (the "next day" tests with both times AM or PM). The other tests mentioned there pass.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Average start time handling midnight
by salva (Canon) on Jul 21, 2016 at 15:32 UTC
|
You can calculate the average and the variance for the 24h intervals centered around every hour (0, 1, 2,... 23) and pick the center which minimizes the variance:
use POSIX qw(fmod);
my @data = (1, 1, 1, 0, 1, 2, 3, 4, 21, 22, 20, 22, 23, 20, 22);
sub circular_average {
my ($pivot, $round, @data) = @_;
my $sum = 0;
my $sum2 = 0;
my $displacement = 1.5 * $round - $pivot;
for my $data (@data) {
my $displaced = fmod($data + $displacement, $round);
# printf "%2i->%3i,%2i ", $data, $displaced, $displaced - $dis
+placement;
$sum += $displaced;
$sum2 += $displaced * $displaced;
}
# print "\n";
my $inv_n = 1.0 / @data;
my $avg = fmod($inv_n * $sum + 0.5 * $round + $pivot, $round);
wantarray ? ($avg, $inv_n * $sum2 - $inv_n * $inv_n * $sum * $sum)
+ : $avg;
}
my ($best_avg, $best_s2);
for my $time (0..23) {
my ($avg, $s2) = circular_average $time, 24, @data;
if (not defined $best_s2 or $best_s2 > $s2) {
$best_avg = $avg;
$best_s2 = $s2;
}
}
printf "avg: %.1f, s2: %.1f\n", $best_avg, $best_s2;
Then, you can even iterate a few times centering the interval around the best average ($best_avg) found:
for (0..5) {
($best_avg, $best_s2) = circular_average $best_avg, 24, @data;
printf "avg: %.1f, s2: %.1f\n", $best_avg, $best_s2;
}
| [reply] [d/l] [select] |
Re: Average start time handling midnight
by kcott (Archbishop) on Jul 21, 2016 at 16:51 UTC
|
G'day chrisjej,
Here's another way to do it using the builtin modules
Time::Piece
and Time::Seconds.
I've shown averages truncated to whole minutes (as per your OP)
and also averages that haven't been truncated (i.e. shows seconds in the output).
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Piece;
use Time::Seconds;
use Test::More;
my @ranges = (
['11:00', '13:00', '12:00', '12:00:00'],
['23:00', '01:00', '00:00', '00:00:00'],
['23:30', '00:30', '00:00', '00:00:00'],
['12:00', '12:00', '12:00', '12:00:00'],
['00:00', '00:00', '00:00', '00:00:00'],
['00:00', '00:02', '00:01', '00:01:00'],
['00:00', '00:01', '00:00', '00:00:30'],
['23:58', '00:00', '23:59', '23:59:00'],
['23:59', '00:00', '23:59', '23:59:30'],
);
plan tests => @ranges * 2;
test_average($_) for @ranges;
sub test_average {
my ($range) = @_;
my $t0 = get_tp($range->[0]);
my $t1 = get_tp($range->[1]);
$t1 += ONE_DAY if $t1 < $t0;
my $avg = ($t1->epoch + $t0->epoch) / 2;
ok(
Time::Piece->strptime($avg, '%s')->strftime('%H:%M') eq $range
+->[2],
"$range->[0] -> $range->[1]: Avg = $range->[2] [HH:MM]"
);
ok(
Time::Piece->strptime($avg, '%s')->strftime('%H:%M:%S') eq $ra
+nge->[3],
"$range->[0] -> $range->[1]: Avg = $range->[3] [HH:MM:SS]"
);
}
sub get_tp { Time::Piece->strptime(shift, '%H:%M') }
Output:
1..18
ok 1 - 11:00 -> 13:00: Avg = 12:00 [HH:MM]
ok 2 - 11:00 -> 13:00: Avg = 12:00:00 [HH:MM:SS]
ok 3 - 23:00 -> 01:00: Avg = 00:00 [HH:MM]
ok 4 - 23:00 -> 01:00: Avg = 00:00:00 [HH:MM:SS]
ok 5 - 23:30 -> 00:30: Avg = 00:00 [HH:MM]
ok 6 - 23:30 -> 00:30: Avg = 00:00:00 [HH:MM:SS]
ok 7 - 12:00 -> 12:00: Avg = 12:00 [HH:MM]
ok 8 - 12:00 -> 12:00: Avg = 12:00:00 [HH:MM:SS]
ok 9 - 00:00 -> 00:00: Avg = 00:00 [HH:MM]
ok 10 - 00:00 -> 00:00: Avg = 00:00:00 [HH:MM:SS]
ok 11 - 00:00 -> 00:02: Avg = 00:01 [HH:MM]
ok 12 - 00:00 -> 00:02: Avg = 00:01:00 [HH:MM:SS]
ok 13 - 00:00 -> 00:01: Avg = 00:00 [HH:MM]
ok 14 - 00:00 -> 00:01: Avg = 00:00:30 [HH:MM:SS]
ok 15 - 23:58 -> 00:00: Avg = 23:59 [HH:MM]
ok 16 - 23:58 -> 00:00: Avg = 23:59:00 [HH:MM:SS]
ok 17 - 23:59 -> 00:00: Avg = 23:59 [HH:MM]
ok 18 - 23:59 -> 00:00: Avg = 23:59:30 [HH:MM:SS]
| [reply] [d/l] [select] |
Re: Average start time handling midnight
by BillKSmith (Monsignor) on Jul 21, 2016 at 15:21 UTC
|
There does not seem to be a general solution to this problem. You did not mention how many reading you must average. A 'solution' which works well for two readings may fail for three or more. The article https://en.wikipedia.org/wiki/Mean_of_circular_quantities suggests an algorithm that seems to always do what we want. Too bad it is so complex.
| [reply] |
|
|
Maybe I'm being too simplistic, but given the OP's restrictions, "usually start within a 6 hour window" and "handles times over midnight", I think it's doable. If the 6 hours ended just after midnight, you would need to handle times from 18:01 - 00:01; if it started just before midnight, you would need to handle 23:59 - 5:59. For simplicity's sake, I'd make my window 18:00 - 5:59. Thus,
$n = 0;
while ( ... ) {
$hour += 24 if( $hour < 6 );
next if( $hour < 18 ); # ignore times outside the window
$sum += 3600*$hour + 60*$min + $sec;
$avg = $sum / ++$n;
}
For the $hour < 18 rule: if the time is before 6am, it's $hour would have already been adjusted to somewhere in range 24 to 29, so wouldn't trigger the <18 condition; if the time was after 6pm, it wouldn't trigger the condition; else, it will trigger the condition, and the time would be ignored.
If OP wants a narrower or wider 'accept-range', just adjust the two comparisons appropriately.
Other options would be to clamp times between 6am and noon to 05:59:59, and between noon and 6pm to 18:00:00, which will not ignore points, but manipulate them to fall within the "normal window".
I understand there are issues in the general case, but with the restrictions given, I think things are well-defined enough, and this algorithm should give a mean closer to midnight than to noon, which seems to be what the OP wants. | [reply] [d/l] [select] |
|
|
#! /usr/bin/perl -wl
my @times = qw( 17:00 19:00 11:00 13:00 23:00 01:00
10:30 13:00 19:40 01:20 16:00 02:00 );
# keep angles in range -pi .. pi
sub _PI () { 2 * atan2(1, 0) }
sub _TC () { 24 * 60 / (2 * _PI) }
sub time2angle {
map {
my ($h, $m) = split /:/;
_PI - (60 * $h + $m) / _TC
} @_
}
sub angle2time {
map {
my $m = int _TC * (_PI - $_);
sprintf "%02d:%02d", $m / 60, $m % 60
} @_
}
use List::Util qw( sum );
sub circ_avg { atan2 sum(map sin, @_), sum(map cos, @_) }
for (; @times > 1; shift @times) {
my @t = @times[0, 1];
print "@t => @{[angle2time circ_avg time2angle @t]}";
}
Thank you for the wikipedia link. | [reply] [d/l] |
|
|
When averaging two times, yes, it's silly. But when averaging more than two, it comes out with a very different result.
... # your code thru the definition of sub circ_avg;
print "@times";
print " => circle vs line";
print " => @{[angle2time circ_avg time2angle @times]} vs @{
+[angle2time( sum( time2angle @times ) / @times ) ]}";
for (; @times > 1; shift @times) {
my @t = @times[0, 1];
print "@t => @{[angle2time circ_avg time2angle @t]} vs @{[angle2ti
+me( sum( time2angle @t ) / @t ) ]}";
}
__END__
__OUTPUT__
17:00 19:00 11:00 13:00 23:00 01:00 10:30 13:00 19:40 01:20 16:00 02:0
+0
=> circle vs line
=> 17:46 vs 12:12
17:00 19:00 => 18:00 vs 18:00
19:00 11:00 => 15:00 vs 15:00
11:00 13:00 => 12:00 vs 12:00
13:00 23:00 => 18:00 vs 18:00
23:00 01:00 => 00:00 vs 12:00
01:00 10:30 => 05:45 vs 05:45
10:30 13:00 => 11:45 vs 11:45
13:00 19:40 => 16:20 vs 16:20
19:40 01:20 => 22:30 vs 10:30
01:20 16:00 => 20:40 vs 08:40
16:00 02:00 => 21:00 vs 09:00
The pairs of times come out to a simple average that mostly matches the circles (except on the ones with midnight between time1 and time2); but you get very different results between the two averages for the entire list. Previously, I had compared my own implementation of the angular average (yours is better) to the results of salva's solution of finding the center-with-minimum-variance and found that the results for salva's example list (or a list of random times) gave very similar means to the angular average, which would be very different from the arithmetic mean for the same set of data | [reply] [d/l] |
|
|
Perhaps equivalent:
Convert the time to radians. Treat all times as unit vectors in polar coordinates, with the time_in_radians as the angle. Convert the polar (1, time_in_radians) to Cartesion (x,y). Sum all of the xs and ys separately, (optionally averaging them, again separately), and convert the result back to polar. The average time is the angle. The averaged magnitude might be interesting also.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
Re: Average start time handling midnight
by SimonPratt (Friar) on Jul 21, 2016 at 14:54 UTC
|
An obvious approach is to convert times to seconds, find the midpoint, adjust and return, like so:
use 5.16.2;
use POSIX 'strftime';
my @test = (['11:00', '13:00'], ['23:00', '01:00'], ['10:30', '13:00']
+, ['19:40', '01:20']);
foreach my $tst (@test) {
print "Testing '$tst->[0]', '$tst->[1]': ";
say getmid($tst->[0], $tst->[1]);
}
sub getmid($$) {
my ($start, $end) = @_;
$start =~ /^(\d{2}):(\d{2})$/ or die "invalid start date provided:
+ '$start', must be in format hh:mm\n";
my $startseconds = $1 * 3600 + $2 *60;
$end =~ /^(\d{2}):(\d{2})$/ or die "invalid end date provided: '$e
+nd', must be in format hh:mm\n";
my $endseconds = $1 * 3600 + $2 * 60;
my $range;
if ($endseconds < $startseconds) { $range = 86400 - $startseconds
++ $endseconds }
else { $range = $endseconds - $startse
+conds }
my $average = $startseconds + int($range / 2);
return strftime("%H:%M", gmtime($average));
}
| [reply] [d/l] |
Re: Average start time handling midnight
by hippo (Archbishop) on Jul 21, 2016 at 12:34 UTC
|
Subtract 24h from all times which are >12:00. Then calculate the mean, which is an offset from midnight. If the mean is < 0, add 24h back onto it.
| [reply] |
|
|
Your algorithm has trouble with the first example, 11a and 1pm. Average of 11 and -11 is 0.
Update (and updated again - no need to test for negative average (and updated again to allow minutes along with hours)):
use strict;
use warnings;
use Test::Simple tests => 8;
ok(&avg('01:05','03:13') eq '02:09', 'AM Only');
ok(&avg('20:43','22:45') eq '21:44', 'PM Only');
ok(&avg('09:00','13:00') eq '11:00', 'AM to PM');
ok(&avg('15:12','01:52') eq '20:32', 'PM to AM next day');
ok(&avg('09:10','07:08') eq '20:09', 'AM to AM next day');
ok(&avg('15:02','13:30') eq '02:16', 'PM to PM next day');
ok(&avg('11:00','13:00') eq '12:00', 'OP Example 1');
ok(&avg('23:00','01:00') eq '00:00', 'OP Example 2');
sub avg {
my ($x,$y) = @_;
$x = ttoi($x);
$y = ttoi($y);
if ($y < $x) {
$y += (24 * 60);
}
return itot((($x + $y)/2) % (24 * 60));
}
sub ttoi {
my $t = shift;
my ($h,$m) = split /:/,$t;
return $h * 60 + $m;
}
sub itot {
my $i=shift;
my $h = $i / 60;
my $m = $i % 60;
return sprintf "%02d:%02d",$h,$m;
}
But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)
| [reply] [d/l] |
|
|
| [reply] |
Re: Average start time handling midnight
by gandolf989 (Scribe) on Jul 21, 2016 at 19:08 UTC
|
There are date functions that will just give you the elapsed time. They should even be able to account for different timezones.
http://search.cpan.org/~drolsky/DateTime1.34/lib/DateTime.pm#Datetime_Subtraction
Another option is to convert the dates to unix time and subtract, then figure out how many hours, minutes and seconds elapsed. This is a great place to just use a library rather than build something new. | [reply] |
|
|
It would indeed be a trivial exercise if the date information were available as well as the time of day. However, there's no indication at all in the OP that the date information is present in the data set and that is why it has garnered responses which are as varied as they are many.
BTW, the URL you included gives a 404, perhaps you meant http://search.cpan.org/~drolsky/DateTime-1.34/lib/DateTime.pm#Datetime_Subtraction?
| [reply] |