Trace On has asked for the wisdom of the Perl Monks concerning the following question:

Ok. I know that cpan has everything. Buuuuut.

Is there a way to find similar entries in hashes for datasets like (5,7,2,5,1)?.

My problem is not that I could not code some loops to compare all entries of a hash with each other. (Though I am pretty sure that I could not do it efficiently.)

The algorhythm I am looking for should have its own opionion about the similarity. It should regard (3,9,4,2,1) as more similar than (100,100,100,100,100). How it measures the similarity is not too important. I would accept any algorhythm as way more creative than my best.

Replies are listed 'Best First'.
Re: Hashes: Find similar entries!
by Corion (Patriarch) on Dec 17, 2015 at 08:08 UTC

    Why is (3,9,4,2,1) more similar than (100,100,100,100,100)?

    Do you mean (3,9,4,2,1) is more similar to (5,7,2,5,1) than (100,100,100,100,100) to (5,7,2,5,1)?

    If you want some amount of similarity between two ordered sets, I would interpret them as vectors and look at the Cosine Similarity between the two. This will give you a measure for the "direction" of where the two vectors point, so you will likely also want to compare the length of the two vectors.

    An alternative if your sets aren't really vectors would be to order them by size and compute the squared difference between each two numbers in the same place.

    If your sets still are ordered but the cosine similarity doesn't give you equality when you want it would be to just calculate the squared difference between each of the set elements.

      Distance! That's it! Thank you, guys!
Re: Hashes: Find similar entries!
by hdb (Monsignor) on Dec 17, 2015 at 08:10 UTC

    In a similar line of thought as Corion's I would rather consider distance than similarity. You could use the square root of sum of square of the differences of entries in your list (a.k.a. Euklidean distance) and then call the ones similar that are close with respect to that measure of distance.

Re: Hashes: Find similar entries!
by u65 (Chaplain) on Dec 17, 2015 at 12:05 UTC

    You could consider the sets as unordered and calculate well-known statistics for them for comparison. But this seems like a very open-ended question without more information from the OP.

Re: Hashes: Find similar entries!
by Lennotoecom (Pilgrim) on Dec 17, 2015 at 08:18 UTC
    I don't know what exactly author wants,
    maybe this?
    to detect whether a number belongs to a pool of defined similarity
    @a = qw/5 7 2 5 1 3/; undef @{$h{a}}{5, 4, 3}; undef @{$h{b}}{7, 9, 6}; undef @{$h{c}}{0, 2, 1}; for $x (@a){ exists $h{$_}{$x} and print "$x pool $_\n" for keys %h; }
Re: Hashes: Find similar entries!
by Anonymous Monk on Dec 17, 2015 at 07:19 UTC