gnu@perl has asked for the wisdom of the Perl Monks concerning the following question:

I have a target number (ctime of a file) and a list of epoch times. I need to find the epoch time from the list that is closest to the target number. It does not matter if the epoch from the list is <,= or > than the target number.

What I have come up with so far is this:

my %tmphash; my $ctime = (stat($key))[10]; print "IN MULTIPLES \$ctime is $ctime\n"; for (@returns) { print "Working Multiple match: $_\n"; $tmphash{abs($ctime - $_)} = $_; } print "MULTIPLE MATCH best match = ". $tmphash{(reverse(sort(keys(%tmphash))))[0]}."\n";
Does anyone have any comments, suggestions or better ideas on how to do this?

Replies are listed 'Best First'.
Re: Find a number in a list closest to a target
by eric256 (Parson) on Jul 25, 2003 at 14:08 UTC

    You could just loop threw comparing the times. No need for hashes and sorting and keys then.

    my $BestTime; my $BestOffBy; my $ctime = (stat($key))[10]; print "IN MULTIPLES \$ctime is $ctime\n"; for (@returns) { print "Working Multiple match: $_\n"; my $temp = abs($ctime - $_); if ($temp < $BestOffBy) { $BestOffBy = $temp; $BestTime = $_ } } print "$BestTime is only $BestOffBy away from $ctime.";

    I'm not sure this is any better or faster but eliminating the reverse, sort, and keys of the hash could be important if you are checking alot of times.


    ___________
    Eric Hodges
      Yeah, much better. I will be doing this A LOT! Good catch.

        If you will only look up one value against the list, or if the list will vary between lookups, then the linear search will be the best solution.

        But if your list remains static, and you need to compare more than one value against it, it wouldn't take many linear searches before the cost of sorting the static list and performing a binary chop to locate the nearest two values and then determining the minimum offset between the two would be amortised and begin to pay dividends.

        If you need code for this, speak up:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      One note though, I had to initialize $BestOffBy = $ctime or the 'if' statement was never entered.
Re: Find a number in a list closest to a target
by broquaint (Abbot) on Jul 25, 2003 at 14:14 UTC
    Keeping track of the difference and the closest time might be easier e.g
    my $ctime = (stat($key))[10]; my($diff, $closest) = $ctime; for(@returns) { $closest = $_ and $diff = abs($ctime - $_) if abs($ctime - $_) < $diff; } print "closest match is - $closest\n";
    The code is untested, but it should give you something to work with, and saves having to keep a hash around of all the differences and times.
    HTH

    _________
    broquaint

Re: Find a number in a list closest to a target
by TomDLux (Vicar) on Jul 25, 2003 at 14:17 UTC

    You say you only want the one which is closest. So instead of using spacec saving data you don't need, and undergoing the expense of a sort, why not keep track of the closest as you go along. It's much the same as determining the minimum of a list.

    $delta = time(); // has to be less than now - 1970 $minval = $min; for ( @returns ) { my $val = abs( $ctime - $_ ); if ( $val < $delta ) { $delta = $val; $minval = $_; } } print "MULTIPLE MATCH best match = $minval";
    update: I see I'll have to learn to type faster!

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Find a number in a list closest to a target
by Limbic~Region (Chancellor) on Jul 25, 2003 at 14:15 UTC
    gnu@perl,
    Unfortunately I don't have the time to go find links to all the references I want, but what you want to do is:

  • Sort the epoch times
  • Perform a variation on a binary search

    The variation on the binary search is to also keep track of the offset and not just is it higher or lower. This is proven to be the fastest algorithm for this specific type of search.

    The only trick will be if two items in the list are equidistant from the target number and you would prefer one over the other. In that case you will have to do more work.

    If you would like additional information and can wait until tonight (EST) - drop me a note.

    Cheers - L~R

      This may or may not be a wholly useful thing to do, depending on some stuff that he didn't state in his problem. It wasn't clear from his description whether or not he would be repeatedly doing searches on the same list, or if he'd be getting a fresh list every time. If he uses the naive method of traversing the whole n element array, this will yield him O(n) for one search, and O(m*n) were he to perform this search m times. With your strategy, in the case of a single search he'd get O(n*lg(n)) (unnecessary overhead because we're doing pre-processing then throwing it away), but in the case of m searches we'd have O(m*lg(n)) + O(n*lg(n)) = O((m+n)*lg(n)) which we can simplify further if we know the relative sizes of m and n. Thus, your idea is fantastic if he meant he'd be repeatedly searching on the same list, but he wasn't clear whether that was the case, so it might turn out to be an investment of pre-computation in a problem that doesn't in fact exist.
Re: Find a number in a list closest to a target
by dragonchild (Archbishop) on Jul 25, 2003 at 15:34 UTC
    All of the above suggestions are good. There are a few optimizations you can do if you can add a constraint or two on your list of epoch times.

    If you can require that the list be sorted, you can do something like:

    my $best = undef; for (@returns) { my $diff = abs($ctime - $_); unless (defined $best) { $best = [$_, $diff]; next; } if ($diff < $best) { $best = [$_, $diff]; next; } if ($diff > $best) { last; } } print "Best match is $best->[0] at $best->[1] away from $ctime\n";
    The presumption here is that $best->[0] will be getting smaller and smaller, then start getting larger. If your list is sorted, then you can stop looking once the smallest difference is found - you won't be beating it.

    Now, the above algorithm is a naive search, or linear search. You can look at doing a binary search, but it becomes somewhat complicated. If you have more than, say, 1000 items and that list is completely static and you look up into it more than, say, 100 times, then I'd look at doing a cached binary search. (The numbers are somewhat arbitrary. However, linear searches are actually faster than binary searches for extremely small sets, like under 30 elements. Also, it's harder to maintain a cached binary search than a early-terminating linear search, which may be a consideration.)

    There are several methods to cache. One of them could be noting where several marker points are. For example, if you were doing a search across the numbers 1 to 100, you'd note where 10, 20, 30, 40, 50, etc would be. This is especially important if your array is sparsely populated in some areas, but densely populated in others. This would allow you to find out which subset of the search area you need to focus in. Depending on how large your search area is, you might have second-level markers, possibly only in dense subsections.

    The goal here is that if you have a large number of elements to check against, you want to reduce the number of comparisons you make. This allows you to reduce your search time. But, remember that for smaller sets (under 100 elements), most search algorithms tend to run within acceptable times, regardless if they're O(n) or O(log n ** 0.1). (Big-O of the tenth root of the log of n ... very very fast!) At those sizes, your bottlenecks start being I/O and your comprehension, not the search algorithm being used.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Find a number in a list closest to a target
by ihb (Deacon) on Jul 25, 2003 at 23:25 UTC

    Just because &reduce is pretty and TMTOWTDI:

    use List::Util 'reduce'; my $closest = reduce { abs($a - $ctime) < abs($b - $ctime) ? $a : $b + } @epochs; print $closest;

    or even

    my $closest = reduce { $a->[1] < $b->[1] ? $a : $b } map [ $_ => abs($ctime - $_) ], @epochs; print $closest->[0];

    Not that I'd expect the second to be more efficient in this particular case. :-) (In fact, I benchmarked the regular for loops in the previous replies with and without a temporary $diff variable and it seemed like it was more efficient to just redo the math (on ActivePerl 5.8.0 build 806). But please verify that yourself, as the benchmark was hasty and inexhaustive.)

    ihb
Re: Find a number in a list closest to a target
by gnu@perl (Pilgrim) on Jul 25, 2003 at 17:04 UTC
    Note to all:
    Since so many people have mentioned it, the list of epoch times varies. The code example I posted is in a loop which receives a new reference time and list of epoch times.
      How large is your list of epoch times? Is it under 100 or over 100? If it's under 100, a linear search would be best, in my opinion, because it's easiest to code up and maintain. If this is speed-critical, where every millisecond counts, then you'll probably want to do something a little more complex.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Relatively small, probably under 10 items. That is why I was looking for another way to do it. The hash and sorting was too much overhead for the amount of data. Although I will be processing about 100,000 files daily the data per file (epoch times) will most likely never exceed 5 or 6 items.