Re^4: search through hash for date in a range

Yes, sorry; The date ranges can overlap each other. The search sub just returns when it find the first match of a date in any range.

My example should have shown this.

The ranges are more like this:

2018-03-05 06:00:00 -> 2018-03-06 01:00:00
2018-03-05 06:00:00 -> 2018-03-06 02:00:00
2018-03-05 06:00:00 -> 2018-03-06 03:00:00
2018-03-05 06:00:00 -> 2018-03-06 04:00:00
2018-03-05 06:00:00 -> 2018-03-06 05:00:00
2018-03-05 06:00:00 -> 2018-03-06 06:00:00

2018-03-06 06:00:00 -> 2018-03-06 07:00:00
2018-03-06 06:00:00 -> 2018-03-06 08:00:00

2018-03-06 06:00:00 -> 2018-03-06 10:00:00
2018-03-06 06:00:00 -> 2018-03-06 11:00:00
2018-03-06 06:00:00 -> 2018-03-06 12:00:00
2018-03-06 06:00:00 -> 2018-03-06 13:00:00
[download]

It is of operational assumption given the above set of ranges (notice 06:00 -> 09:00 is missing which happens often) that we are "normally" processing hour 13 (06:00 -> 14:00 but that is no guarantee ... (Around 6:30 am the log is rotated so the range stars over ...)

If my next load is from 08:00 -> 09:00 then I can skip loading the dates that are already loaded in the 06:00 -> 10:00 data set.

I know, complicated ...

Comment on Re^4: search through hash for date in a range Download Code

Replies are listed 'Best First'.
Re^5: search through hash for date in a range by poj (Abbot) on Mar 06, 2018 at 19:15 UTC
Ok, I was thinking about using a 86,400 element array. One for each second of the day (ranges crossing days would need some sort of sliding window and offset I guess). Anyway something to investigate perhaps. #!/usr/bin/perl use warnings; use strict; my @range=(); while (<DATA>){ chomp; if (/^R:(.)/){ my ($i,undef,$s,undef,$e) = split /[, ]/,$1; $s = sec($s); $e = sec($e); for ($s..$e){ $range[$_] = $i; } } else { my (undef,$t) = split /-/,$_; my $rangeid = $range[ sec($t) ]; if ( defined $rangeid) { print "Found range: $rangeid for $t\n"; } else { print "No range found for $t\n"; } } } sub sec { my ($h,$m,$s) = split /:/,shift; return $h6060 + $m60 + $s } __DATA__ R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25 [download] poj	[reply] [d/l]
Re^6: search through hash for date in a range by bfdi533 (Friar) on Mar 06, 2018 at 20:13 UTC
I took a hybrid approach and am using a bigger array but with the same idea. No need to worry about the sliding window here. I query the DB first for the min/max dates and store these as their epoch. In this sample, that is symbolized with the "X" row. (The values on the "X" line are actual dates from my DB. Then I calculate epoch for each date in the ranges "R" date and subtract the minimum from it to get the array index. Great suggestion. Seems pretty workable. #!/usr/bin/env perl use Date::Manip::Date; use Time::Piece; use warnings; use strict; $\|++; my @vkeys; my $dmd = new Date::Manip::Date; my $cmp_dt = new Date::Manip::Date; my %ranges_dt; my @range; my $tn; my $tx; while (<DATA>) { chomp; if (s/^(\w+)://) { my $cat = $1; if ($cat eq "X") { my ($n, $x) = split ','; $tn = Time::Piece->strptime($n,"%Y-%m-%d %H:%M:%S")->epoch +; $tx = Time::Piece->strptime($x,"%Y-%m-%d %H:%M:%S")->epoch +; } elsif ($cat eq "R") { my ($i, $s, $e) = split ','; my $ts = Time::Piece->strptime($s,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; my $te = Time::Piece->strptime($e,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; for ($ts..$te) { $range[$_] = $i; } } else { my $cd = Time::Piece->strptime($_,"%m/%d/%Y-%H:%M:%S")->ep +och - $tn; my $rangeid = $range[$cd]; if (!defined $rangeid) { print "No range found for $_\n"; } else { print "Found range: $rangeid for $_\n"; } } } } __DATA__ X:2018-02-15 22:49:41,2018-12-13 15:59:59 R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25 [download]	[reply] [d/l]
Re^7: search through hash for date in a range by bfdi533 (Friar) on Mar 06, 2018 at 20:16 UTC
Trying this out on my actual code shows a HUGE time improvement: Results from range lookup with Time::Piece and subroutine `Line: 37000 : 119 seconds : tps: 8.40336134453782 Line: 38000 : 115 seconds : tps: 8.69565217391304 Line: 39000 : 121 seconds : tps: 8.26446280991735 Line: 40000 : 120 seconds : tps: 8.33333333333333 Line: 41000 : 114 seconds : tps: 8.7719298245614 Line: 42000 : 139 seconds : tps: 7.19424460431655 Line: 43000 : 126 seconds : tps: 7.93650793650794 Line: 44000 : 122 seconds : tps: 8.19672131147541 Line: 45000 : 177 seconds : tps: 5.64971751412429 Line: 46000 : 161 seconds : tps: 6.2111801242236` [download] Results with array (seconds) lookup `Line: 37000 : 6 seconds : tps: 166.666666666667 Line: 38000 : 6 seconds : tps: 166.666666666667 Line: 39000 : 7 seconds : tps: 142.857142857143 Line: 40000 : 6 seconds : tps: 166.666666666667 Line: 41000 : 5 seconds : tps: 200 Line: 42000 : 7 seconds : tps: 142.857142857143 Line: 43000 : 7 seconds : tps: 142.857142857143 Line: 44000 : 6 seconds : tps: 166.666666666667 Line: 45000 : 7 seconds : tps: 142.857142857143 Line: 46000 : 7 seconds : tps: 142.857142857143` [download]	[reply] [d/l] [select]
Re^6: search through hash for date in a range by bfdi533 (Friar) on Mar 06, 2018 at 19:36 UTC
That is a great idea! I will play around with that to see what I can do with it ...	[reply]