jim_neophyte has asked for the wisdom of the Perl Monks concerning the following question:

help would be appreciated to help me understand what to pass to the "diff" or "traverse_sequences" to allow me to compare 4.1 and 4.0 and call them equal.

i have attempted to pass in both a named and an anonymous subroutine that attempts to access via symbolic reference the variables $a and $b of the scope calling "&$keyGen" in "_longestCommonSubsequence" which really calls my subroutine.

if i understand the camel book correctly, symbolic references do not work across packages. (just read that). hmphh!

in my sub i check if they are close; if so, take the avg and pump through sprintf. return either the original first parameter or the new avg.

then both calls to "&$keyGen" would be returning the same value.

much thanks, jim

sample code trying out my idea leaving in various attempts.

use warnings; use strict; use lib qw( /rshome/jaw2/lib/site_perl/5.8.0 ); use Algorithm::Diff qw( diff ); my @a = qw( 1.0 2.1 3.2 4.0 4.9 5.8 ); my @b = qw( 1.0 2.0 3.0 4.0 5.0 6.0 ); my @d = diff( \@a, \@b, sub { my( $o ) = @_; #my( $o, $aa, $bb ) = @_; #no strict 'refs'; my $d = abs( $a - $b ); #my $d = abs( $$aa - $$bb ); if( $d <= 0.1 ) { my $avg = sprintf "%14.6e", ( $a + $b ) / 2; #my $avg = sprintf "%14.6e", ( $$aa + $$bb ) / + 2; return $avg; } return $o; } ); #####'a', 'b' #my @d = diff( \@a, \@b, \&mykeygen, 'a', 'b' ); #my @d = diff( \@a, \@b, \&mykeygen, { 'a' => '$_[0]', 'b' => '$_[1]' +} ); #my @d = diff( \@a, \@b, \&mykeygen, '$_[0]', '$_[1]' ); print "d array:\n", join( "\n\t", @d ), "\n";

Replies are listed 'Best First'.
Re: using Algorithm::Diff for floats
by demerphq (Chancellor) on Jul 27, 2004 at 18:55 UTC

    It would seems that what you want to do is consider elements the same when they are below a certain delta from each other. The trouble with this is that A::D expects the keygen function to be one "that should return a string that uniquely identifies a given element." Thus it only considers a single element at a time which IMO makes meeting your implied requirment difficult. int() wont cut it as itll make 4.1 and 4.0 the same but not 3.9 and 4.0 which would apparently not be useful to you.

    Iirc tye is working on an updated A::D so you may be able to convince him that adding a key comparison function is a good idea. However until that happens I suspect id be looking at customizing A::D to support such. If thats beyond you then who knows, a prolific author like BrowserUK may just pop up with a viable patch until tye gets his evil schemes into CPAN play...


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      i do not employ the int() function for this.

      i take the absolute value of the diff and compare to my threshold. if larger than threshold, simply return original string.

      otherwise, get an average of the two values and pump it through sprintf, because comparing floats for equality is dangerous; my understanding is that prevents it from being a number and makes a string.

      the case of &$keyGen(3.9, "4.0 somehow") eq &$keyGen(4.0, "3.9 somehow")

      both calls should return "3.95".

      i just need to figure out how to get my sub to access "$b" when the first parameter is "$a" and vice versa.

      i think i was originally unclear. is this better?

Re: using Algorithm::Diff for floats
by BrowserUk (Patriarch) on Jul 27, 2004 at 20:45 UTC

    It would be nice if Algorithm::Diff took a custom comparator function rather than a keygen function I think.

    The easiest way I could see to solve this without having to dig around in the guts of A::L, is to bless each element of the arrays being compared and overload the 'eq' operator. You also have to overload the stringify operator to make this work.

    #! perl -slw use warnings; use strict; use Data::Dumper; use Algorithm::Diff qw( diff ); { package myFloats; use overload 'eq' => \&cmp, '""' => \&stringy; sub new{ return bless \$_[ 1 ], $_[ 0 ] } sub cmp{ abs( ${ $_[ 0 ] } - ${ $_[ 1 ] } ) > 0.1 ; } sub stringy{ ${ $_[ 0 ] } } } my @a = map{ myFloats->new( $_ ) } qw( 1.0 2.1 3.2 4.0 4.9 5.8 ); my @b = map{ myFloats->new( $_ ) } qw( 1.0 2.0 3.0 4.0 5.0 6.0 ); my @d = diff( \@a, \@b ); print "@$_" for map{ @$_ } @d; __END__ P:\test>377814 - 1 2.1 - 2 3.2 + 1 2.0 + 2 3.0 - 4 4.9 + 4 5.0

    Now whether that output makes any sense, I can't say. I've never really understood the output format of A::D. I added a little trace information into the cmp() sub

    sub cmp{ print "cmp: @_"; abs( ${ $_[ 0 ] } - ${ $_[ 1 ] } ) > 0.1 ; }

    and got this output:

    P:\test>377814 cmp: 1.0 1.0 cmp: 5.8 6.0 cmp: 4.9 5.0 - 1 2.1 - 2 3.2 + 1 2.0 + 2 3.0 - 4 4.9 + 4 5.0

    Which shows the overload is working, but it doesn't look like it is doing enough comparisons to me, but like I said, I never really understood the module. Anyway, maybe the technique will help you make sense of it.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      It would be nice if Algorithm::Diff took a custom comparator function rather than a keygen function I think.

      Then use Algorithm::DiffOld instead (inlcuded with Algorithm::Diff). It is slower, of course.

      - tye        

        Thanks. I'd never even noticed that I had the old version on my machine. I'll have a play. Maybe it will highlight whether it's my code at fault or whether there is a bug in the new version.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

      well thank you very much. this will be a nice (i hope) intro to the overload module/pragma and Perl-OO.

      so far, i have some mods iaw your suggestion earlier to chg the module. i'll post it for comment if it seems to work.

      thanks again!

Re: using Algorithm::Diff for floats (Updated)
by BrowserUk (Patriarch) on Jul 27, 2004 at 22:13 UTC

    My first attempt at the overloading was crap.

    I think that this is an improvement. By overloading the 'cmp' operator only, overload should (and appears to) use it to autogenerate appropriate overloads for 'eq' & 'ne'.

    However, even having re-read the A::D docs to understand the format of the output, I still do not understand the results I am getting?

    P:\test>377814 1.0 cmp 1.0 : (0.00000000000000000000000000000000) : 0 2.1 cmp 2.0 : (0.10000000000000009000000000000000) : 0 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 1.0 2.1 3.2 4.0 4.9 5.8 1.0 2.0 3.0 4.0 5.0 6.0 - 2 3.2 + 2 3.0 - 4 4.9 + 4 5.0 - 5 5.8 + 5 6.0

    When called, the overloaded 'cmp' operator seems to be returning the appropriate results and duly, the output reflects this by not replacing 2.1 with 2.0. However, it then goes on to replace 4.9 with 5.0 which is strange as it never seems to actually compare these two values.

    I'm not sure if this represents a bug in A::D, overload or my use of one, the other or both? It's a mystery, but I enjoy those? I'll update if I track it down.

    Update: Indeed, there does appear to be a bug in cpan:Algorithm::Diff. Moving to using Algorithm::DiffOld renders the following results which is much more what I would have expected.

    1:24:24.44 C:\Perl\test>377814 1.0 cmp 1.0 : (0.00000000000000000000000000000000) : 0 2.1 cmp 2.0 : (0.10000000000000009000000000000000) : 0 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 3.2 cmp 6.0 : (2.79999999999999980000000000000000) : 1 3.2 cmp 5.0 : (1.79999999999999980000000000000000) : 1 3.2 cmp 4.0 : (0.79999999999999982000000000000000) : 1 3.2 cmp 3.0 : (0.20000000000000018000000000000000) : 1 4.0 cmp 6.0 : (2.00000000000000000000000000000000) : 1 4.0 cmp 5.0 : (1.00000000000000000000000000000000) : 1 4.0 cmp 4.0 : (0.00000000000000000000000000000000) : 0 4.0 cmp 3.0 : (1.00000000000000000000000000000000) : 1 4.9 cmp 6.0 : (1.09999999999999960000000000000000) : 1 4.9 cmp 5.0 : (0.09999999999999964500000000000000) : 0 4.9 cmp 4.0 : (0.90000000000000036000000000000000) : 1 4.9 cmp 3.0 : (1.90000000000000040000000000000000) : 1 5.8 cmp 6.0 : (0.20000000000000018000000000000000) : 1 5.8 cmp 5.0 : (0.79999999999999982000000000000000) : 1 5.8 cmp 4.0 : (1.79999999999999980000000000000000) : 1 5.8 cmp 3.0 : (2.79999999999999980000000000000000) : 1 1.0 2.1 3.2 4.0 4.9 5.8 1.0 2.0 3.0 4.0 5.0 6.0 - 2 3.2 + 2 3.0 - 5 5.8 + 5 6.0

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

      The default key generation process stringifies (the keys end up as hash keys), which removes the object nature and prevents your overloaded cmp from being called.

      The only comparisons you see are for the "trim leading/trailing identical items" step that preceeds the 'longest common subsequence' algorithm (to make it faster).

      - tye        

        Not a bug then, just my attempt at cleverness falling flat on it's face. I did look, but I didn't understand.

        Thanks.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon