in reply to Re^4: search through hash for date in a range
in thread search through hash for date in a range

Ok, I was thinking about using a 86,400 element array. One for each second of the day (ranges crossing days would need some sort of sliding window and offset I guess). Anyway something to investigate perhaps.

#!/usr/bin/perl use warnings; use strict; my @range=(); while (<DATA>){ chomp; if (/^R:(.*)/){ my ($i,undef,$s,undef,$e) = split /[, ]/,$1; $s = sec($s); $e = sec($e); for ($s..$e){ $range[$_] = $i; } } else { my (undef,$t) = split /-/,$_; my $rangeid = $range[ sec($t) ]; if ( defined $rangeid) { print "Found range: $rangeid for $t\n"; } else { print "No range found for $t\n"; } } } sub sec { my ($h,$m,$s) = split /:/,shift; return $h*60*60 + $m*60 + $s } __DATA__ R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25
poj

Replies are listed 'Best First'.
Re^6: search through hash for date in a range
by bfdi533 (Friar) on Mar 06, 2018 at 20:13 UTC

    I took a hybrid approach and am using a bigger array but with the same idea. No need to worry about the sliding window here.

    I query the DB first for the min/max dates and store these as their epoch. In this sample, that is symbolized with the "X" row. (The values on the "X" line are actual dates from my DB.

    Then I calculate epoch for each date in the ranges "R" date and subtract the minimum from it to get the array index.

    Great suggestion. Seems pretty workable.

    #!/usr/bin/env perl use Date::Manip::Date; use Time::Piece; use warnings; use strict; $|++; my @vkeys; my $dmd = new Date::Manip::Date; my $cmp_dt = new Date::Manip::Date; my %ranges_dt; my @range; my $tn; my $tx; while (<DATA>) { chomp; if (s/^(\w+)://) { my $cat = $1; if ($cat eq "X") { my ($n, $x) = split ','; $tn = Time::Piece->strptime($n,"%Y-%m-%d %H:%M:%S")->epoch +; $tx = Time::Piece->strptime($x,"%Y-%m-%d %H:%M:%S")->epoch +; } elsif ($cat eq "R") { my ($i, $s, $e) = split ','; my $ts = Time::Piece->strptime($s,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; my $te = Time::Piece->strptime($e,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; for ($ts..$te) { $range[$_] = $i; } } else { my $cd = Time::Piece->strptime($_,"%m/%d/%Y-%H:%M:%S")->ep +och - $tn; my $rangeid = $range[$cd]; if (!defined $rangeid) { print "No range found for $_\n"; } else { print "Found range: $rangeid for $_\n"; } } } } __DATA__ X:2018-02-15 22:49:41,2018-12-13 15:59:59 R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25

      Trying this out on my actual code shows a HUGE time improvement:

      Results from range lookup with Time::Piece and subroutine

      Line: 37000 : 119 seconds : tps: 8.40336134453782 Line: 38000 : 115 seconds : tps: 8.69565217391304 Line: 39000 : 121 seconds : tps: 8.26446280991735 Line: 40000 : 120 seconds : tps: 8.33333333333333 Line: 41000 : 114 seconds : tps: 8.7719298245614 Line: 42000 : 139 seconds : tps: 7.19424460431655 Line: 43000 : 126 seconds : tps: 7.93650793650794 Line: 44000 : 122 seconds : tps: 8.19672131147541 Line: 45000 : 177 seconds : tps: 5.64971751412429 Line: 46000 : 161 seconds : tps: 6.2111801242236

      Results with array (seconds) lookup

      Line: 37000 : 6 seconds : tps: 166.666666666667 Line: 38000 : 6 seconds : tps: 166.666666666667 Line: 39000 : 7 seconds : tps: 142.857142857143 Line: 40000 : 6 seconds : tps: 166.666666666667 Line: 41000 : 5 seconds : tps: 200 Line: 42000 : 7 seconds : tps: 142.857142857143 Line: 43000 : 7 seconds : tps: 142.857142857143 Line: 44000 : 6 seconds : tps: 166.666666666667 Line: 45000 : 7 seconds : tps: 142.857142857143 Line: 46000 : 7 seconds : tps: 142.857142857143
Re^6: search through hash for date in a range
by bfdi533 (Friar) on Mar 06, 2018 at 19:36 UTC

    That is a great idea! I will play around with that to see what I can do with it ...