in reply to Re^2: search through hash for date in a range
in thread search through hash for date in a range

Can the ranges overlap each other ? In the 3 shown as example they don't.

poj
  • Comment on Re^3: search through hash for date in a range

Replies are listed 'Best First'.
Re^4: search through hash for date in a range
by bfdi533 (Friar) on Mar 06, 2018 at 19:07 UTC

    Yes, sorry; The date ranges can overlap each other. The search sub just returns when it find the first match of a date in any range.

    My example should have shown this.

    The ranges are more like this:

    2018-03-05 06:00:00 -> 2018-03-06 01:00:00 2018-03-05 06:00:00 -> 2018-03-06 02:00:00 2018-03-05 06:00:00 -> 2018-03-06 03:00:00 2018-03-05 06:00:00 -> 2018-03-06 04:00:00 2018-03-05 06:00:00 -> 2018-03-06 05:00:00 2018-03-05 06:00:00 -> 2018-03-06 06:00:00 2018-03-06 06:00:00 -> 2018-03-06 07:00:00 2018-03-06 06:00:00 -> 2018-03-06 08:00:00 2018-03-06 06:00:00 -> 2018-03-06 10:00:00 2018-03-06 06:00:00 -> 2018-03-06 11:00:00 2018-03-06 06:00:00 -> 2018-03-06 12:00:00 2018-03-06 06:00:00 -> 2018-03-06 13:00:00

    It is of operational assumption given the above set of ranges (notice 06:00 -> 09:00 is missing which happens often) that we are "normally" processing hour 13 (06:00 -> 14:00 but that is no guarantee ... (Around 6:30 am the log is rotated so the range stars over ...)

    If my next load is from 08:00 -> 09:00 then I can skip loading the dates that are already loaded in the 06:00 -> 10:00 data set.

    I know, complicated ...

      Ok, I was thinking about using a 86,400 element array. One for each second of the day (ranges crossing days would need some sort of sliding window and offset I guess). Anyway something to investigate perhaps.

      #!/usr/bin/perl use warnings; use strict; my @range=(); while (<DATA>){ chomp; if (/^R:(.*)/){ my ($i,undef,$s,undef,$e) = split /[, ]/,$1; $s = sec($s); $e = sec($e); for ($s..$e){ $range[$_] = $i; } } else { my (undef,$t) = split /-/,$_; my $rangeid = $range[ sec($t) ]; if ( defined $rangeid) { print "Found range: $rangeid for $t\n"; } else { print "No range found for $t\n"; } } } sub sec { my ($h,$m,$s) = split /:/,shift; return $h*60*60 + $m*60 + $s } __DATA__ R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25
      poj

        I took a hybrid approach and am using a bigger array but with the same idea. No need to worry about the sliding window here.

        I query the DB first for the min/max dates and store these as their epoch. In this sample, that is symbolized with the "X" row. (The values on the "X" line are actual dates from my DB.

        Then I calculate epoch for each date in the ranges "R" date and subtract the minimum from it to get the array index.

        Great suggestion. Seems pretty workable.

        #!/usr/bin/env perl use Date::Manip::Date; use Time::Piece; use warnings; use strict; $|++; my @vkeys; my $dmd = new Date::Manip::Date; my $cmp_dt = new Date::Manip::Date; my %ranges_dt; my @range; my $tn; my $tx; while (<DATA>) { chomp; if (s/^(\w+)://) { my $cat = $1; if ($cat eq "X") { my ($n, $x) = split ','; $tn = Time::Piece->strptime($n,"%Y-%m-%d %H:%M:%S")->epoch +; $tx = Time::Piece->strptime($x,"%Y-%m-%d %H:%M:%S")->epoch +; } elsif ($cat eq "R") { my ($i, $s, $e) = split ','; my $ts = Time::Piece->strptime($s,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; my $te = Time::Piece->strptime($e,"%Y-%m-%d %H:%M:%S")->ep +och - $tn; for ($ts..$te) { $range[$_] = $i; } } else { my $cd = Time::Piece->strptime($_,"%m/%d/%Y-%H:%M:%S")->ep +och - $tn; my $rangeid = $range[$cd]; if (!defined $rangeid) { print "No range found for $_\n"; } else { print "Found range: $rangeid for $_\n"; } } } } __DATA__ X:2018-02-15 22:49:41,2018-12-13 15:59:59 R:1,2018-03-06 14:20:00,2018-03-06 14:30:00 R:2,2018-03-06 13:00:00,2018-03-06 13:40:00 R:3,2018-03-06 13:45:00,2018-03-06 13:50:00 D:03/06/2018-14:29:41 D:03/06/2018-13:33:38 D:03/06/2018-13:54:47 D:03/06/2018-12:53:34 D:03/06/2018-13:29:19 D:03/06/2018-12:52:47 D:03/06/2018-14:21:51 D:03/06/2018-13:49:20 D:03/06/2018-13:36:18 D:03/06/2018-13:44:25

        That is a great idea! I will play around with that to see what I can do with it ...