in reply to Re^2: Seeking a fast sum_of_ranks_between function (1e6 x 1e6 in 1/5th sec.)
in thread Seeking a fast sum_of_ranks_between function
Damn! The "simplifications" I applied subtly changed the results. Please try again with the original version I posted and see if that matches your existing method?
sub rankSums { my( $aRef, $bRef ) = @_; my( $aSum, $bSum ) = (0) x 2; my( $a, $b ) = (0) x 2; my $rank = 1; while( $a < @$aRef && $b < @$bRef ) { if( $aRef->[ $a ] < $bRef->[ $b ] ) { $aSum += $rank++; ++$a; } elsif( $aRef->[ $a ] > $bRef->[ $b ] ) { $bSum += $rank++; ++$b } else { my $d = 2; my( $aSaved, $bSaved ) = ( $a, $b ); ++$d, ++$a while $a < $#{ $aRef } && $aRef->[ $a ] == $aRe +f->[ $a + 1 ]; ++$d, ++$b while $b < $#{ $bRef } && $bRef->[ $b ] == $bRe +f->[ $b + 1 ]; my $s = sum( $rank .. $rank + $d - 1 ) / $d; $aSum += $s * ( $a - $aSaved + 1 ); $bSum += $s * ( $b - $bSaved + 1 ); $rank += $d; ++$a, ++$b; } } $aSum += $rank++ while $a++ < @{ $aRef }; $bSum += $rank++ while $b++ < @{ $bRef }; return $aSum, $bSum; }
The bonus is (assuming it's correct this time), is that with datasets containing many ties, it actually runs faster too.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Seeking a fast sum_of_ranks_between function (1e6 x 1e6 in 1/5th sec.)
by msh210 (Monk) on Oct 14, 2015 at 15:33 UTC | |
by toolic (Bishop) on Oct 14, 2015 at 15:45 UTC | |
by msh210 (Monk) on Oct 14, 2015 at 15:56 UTC | |
by BrowserUk (Patriarch) on Oct 14, 2015 at 16:19 UTC |