using Algorithm::Diff for floats

jim_neophyte has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: using Algorithm::Diff for floats by demerphq (Chancellor) on Jul 27, 2004 at 18:55 UTC
It would seems that what you want to do is consider elements the same when they are below a certain delta from each other. The trouble with this is that A::D expects the keygen function to be one "that should return a string that uniquely identifies a given element." Thus it only considers a single element at a time which IMO makes meeting your implied requirment difficult. int() wont cut it as itll make 4.1 and 4.0 the same but not 3.9 and 4.0 which would apparently not be useful to you. Iirc tye is working on an updated A::D so you may be able to convince him that adding a key comparison function is a good idea. However until that happens I suspect id be looking at customizing A::D to support such. If thats beyond you then who knows, a prolific author like BrowserUK may just pop up with a viable patch until tye gets his evil schemes into CPAN play... --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi}	[reply]
Re^2: using Algorithm::Diff for floats by jim_neophyte (Sexton) on Jul 27, 2004 at 19:42 UTC
i do not employ the int() function for this. i take the absolute value of the diff and compare to my threshold. if larger than threshold, simply return original string. otherwise, get an average of the two values and pump it through sprintf, because comparing floats for equality is dangerous; my understanding is that prevents it from being a number and makes a string. the case of &$keyGen(3.9, "4.0 somehow") eq &$keyGen(4.0, "3.9 somehow") both calls should return "3.95". i just need to figure out how to get my sub to access "$b" when the first parameter is "$a" and vice versa. i think i was originally unclear. is this better?	[reply]
Re: using Algorithm::Diff for floats by BrowserUk (Patriarch) on Jul 27, 2004 at 20:45 UTC
It would be nice if Algorithm::Diff took a custom comparator function rather than a keygen function I think. The easiest way I could see to solve this without having to dig around in the guts of A::L, is to bless each element of the arrays being compared and overload the 'eq' operator. You also have to overload the stringify operator to make this work. #! perl -slw use warnings; use strict; use Data::Dumper; use Algorithm::Diff qw( diff ); { package myFloats; use overload 'eq' => \&cmp, '""' => \&stringy; sub new{ return bless \$_[ 1 ], $_[ 0 ] } sub cmp{ abs( ${ $_[ 0 ] } - ${ $_[ 1 ] } ) > 0.1 ; } sub stringy{ ${ $_[ 0 ] } } } my @a = map{ myFloats->new( $_ ) } qw( 1.0 2.1 3.2 4.0 4.9 5.8 ); my @b = map{ myFloats->new( $_ ) } qw( 1.0 2.0 3.0 4.0 5.0 6.0 ); my @d = diff( \@a, \@b ); print "@$_" for map{ @$_ } @d; __END__ P:\test>377814 - 1 2.1 - 2 3.2 + 1 2.0 + 2 3.0 - 4 4.9 + 4 5.0 [download] Now whether that output makes any sense, I can't say. I've never really understood the output format of A::D. I added a little trace information into the cmp() sub `sub cmp{ print "cmp: @_"; abs( ${ $_[ 0 ] } - ${ $_[ 1 ] } ) > 0.1 ; }` [download] and got this output: `P:\test>377814 cmp: 1.0 1.0 cmp: 5.8 6.0 cmp: 4.9 5.0 - 1 2.1 - 2 3.2 + 1 2.0 + 2 3.0 - 4 4.9 + 4 5.0` [download] Which shows the overload is working, but it doesn't look like it is doing enough comparisons to me, but like I said, I never really understood the module. Anyway, maybe the technique will help you make sense of it. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l] [select]
Re^2: using Algorithm::Diff for floats (DiffOld.pm) by tye (Sage) on Jul 27, 2004 at 22:26 UTC
It would be nice if Algorithm::Diff took a custom comparator function rather than a keygen function I think. Then `use Algorithm::DiffOld` instead (inlcuded with Algorithm::Diff). It is slower, of course. - tye	[reply] [d/l]
Re^3: using Algorithm::Diff for floats (DiffOld.pm) by BrowserUk (Patriarch) on Jul 27, 2004 at 22:47 UTC
Thanks. I'd never even noticed that I had the old version on my machine. I'll have a play. Maybe it will highlight whether it's my code at fault or whether there is a bug in the new version. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply]
Re^2: using Algorithm::Diff for floats by jim_neophyte (Sexton) on Jul 27, 2004 at 21:00 UTC
well thank you very much. this will be a nice (i hope) intro to the overload module/pragma and Perl-OO. so far, i have some mods iaw your suggestion earlier to chg the module. i'll post it for comment if it seems to work. thanks again!	[reply]
Re: using Algorithm::Diff for floats (Updated) by BrowserUk (Patriarch) on Jul 27, 2004 at 22:13 UTC
My first attempt at the overloading was crap. I think that this is an improvement. By overloading the 'cmp' operator only, overload should (and appears to) use it to autogenerate appropriate overloads for 'eq' & 'ne'. Read more... (1049 Bytes) However, even having re-read the A::D docs to understand the format of the output, I still do not understand the results I am getting? `P:\test>377814 1.0 cmp 1.0 : (0.00000000000000000000000000000000) : 0 2.1 cmp 2.0 : (0.10000000000000009000000000000000) : 0 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 1.0 2.1 3.2 4.0 4.9 5.8 1.0 2.0 3.0 4.0 5.0 6.0 - 2 3.2 + 2 3.0 - 4 4.9 + 4 5.0 - 5 5.8 + 5 6.0` [download] When called, the overloaded 'cmp' operator seems to be returning the appropriate results and duly, the output reflects this by not replacing 2.1 with 2.0. However, it then goes on to replace 4.9 with 5.0 which is strange as it never seems to actually compare these two values. I'm not sure if this represents a bug in A::D, overload or my use of one, the other or both? It's a mystery, but I enjoy those? I'll update if I track it down. Update: Indeed, there does appear to be a bug in cpan:Algorithm::Diff. Moving to using Algorithm::DiffOld renders the following results which is much more what I would have expected. 1:24:24.44 C:\Perl\test>377814 1.0 cmp 1.0 : (0.00000000000000000000000000000000) : 0 2.1 cmp 2.0 : (0.10000000000000009000000000000000) : 0 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 3.2 cmp 6.0 : (2.79999999999999980000000000000000) : 1 3.2 cmp 5.0 : (1.79999999999999980000000000000000) : 1 3.2 cmp 4.0 : (0.79999999999999982000000000000000) : 1 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 4.0 cmp 6.0 : (2.00000000000000000000000000000000) : 1 4.0 cmp 5.0 : (1.00000000000000000000000000000000) : 1 4.0 cmp 4.0 : (0.00000000000000000000000000000000) : 0 4.0 cmp 3.0 : (1.00000000000000000000000000000000) : 1 4.9 cmp 6.0 : (1.09999999999999960000000000000000) : 1 4.9 cmp 5.0 : (0.09999999999999964500000000000000) : 0 4.9 cmp 4.0 : (0.90000000000000036000000000000000) : 1 4.9 cmp 3.0 : (1.90000000000000040000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 5.0 : (0.79999999999999982000000000000000) : 1 5.8 cmp 4.0 : (1.79999999999999980000000000000000) : 1 5.8 cmp 3.0 : (2.79999999999999980000000000000000) : 1 1.0 2.1 3.2 4.0 4.9 5.8 1.0 2.0 3.0 4.0 5.0 6.0 - 2 3.2 + 2 3.0 - 5 5.8 + 5 6.0 [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l] [select]
Re^2: using Algorithm::Diff for floats ('bug') by tye (Sage) on Jul 28, 2004 at 05:55 UTC
The default key generation process stringifies (the keys end up as hash keys), which removes the object nature and prevents your overloaded cmp from being called. The only comparisons you see are for the "trim leading/trailing identical items" step that preceeds the 'longest common subsequence' algorithm (to make it faster). - tye	[reply]
Re^3: using Algorithm::Diff for floats ('bug') by BrowserUk (Patriarch) on Jul 28, 2004 at 06:11 UTC
Not a bug then, just my attempt at cleverness falling flat on it's face. I did look, but I didn't understand. Thanks. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply]