nothing in String::Approx's docs leads me to believe that values from two separate runs of adistr() can be reliably compared.
According to the docs, the
The measure of approximateness is the Levenshtein edit distance. It is the total number of "edits": insertions, deletions and substitutions required to transform a string to another string. For example, to transform "lead" into "gold", you need three edits: The edit distance of "lead" and "gold" is therefore three, or 75%.
Which means that by padding the shorter value, you are getting the Levenshtein distances, converted to a ratio. Provided that one of each pair is the same string, the values should be comparable.
This is the output from the tests I ran, with an additional check of the version:
P:\test>p1
perl> use String::Approx qw[ adist ];;
perl> @inputs = qw(donor_dedupe_source indicted_video_source source1);
+;
perl> print "$_ : ", adist( pack( 'A'.length, 'source' ), $_ ) for @in
+puts;;
donor_dedupe_source : 13
indicted_video_source : 15
source1 : 1
perl> Terminating on signal SIGINT(2)
P:\test>p1
perl> use String::Approx qw[ adistr ];;
perl> @inputs = qw(donor_dedupe_source indicted_video_source source1);
+;
perl> print "$_ : ", adistr( pack( 'A'.length, 'source' ), $_ ) for @i
+nputs;;
donor_dedupe_source : 13
indicted_video_source : 15
source1 : 1
perl> print String::Approx::VERSION;;
print() on unopened filehandle VERSION at (eval 6) line 1, <STDIN> lin
+e 4.
perl> print $String::Approx::VERSION;;
3.19
On the basis that I have seen an occasional quirk of running stuff via my perl shell, I've just run the following script:
#! perl -slw
use strict;
use String::Approx qw[ adist adistr ];
print "String::Approx::VERSION: ", $String::Approx::VERSION;
my @inputs = qw(donor_dedupe_source indicted_video_source source1);
print "\nUsing adist";
print "$_ : ", adist( pack( 'A'.length, 'source' ), $_ ) for @inputs;
print "\nUsing adistr";
print "$_ : ", adistr( pack( 'A'.length, 'source' ), $_ ) for @inputs;
__END__
P:\test>junk2
String::Approx::VERSION: 3.19
Using adist
donor_dedupe_source : 13
indicted_video_source : 15
source1 : 1
Using adistr
donor_dedupe_source : 13
indicted_video_source : 15
source1 : 1
Which shows no difference.
Now I will re-install String::Approx from PPM and try again, and then I get the cpan copy and build that.
Update: It would appear that adistr() changed between versions 3.19 that I've had forever and and 3.25:
P:\test>ppm
PPM - Programmer's Package Manager version 3.2.
Copyright (c) 2001 ActiveState Corp. All Rights Reserved.
ActiveState is a division of Sophos.
Entering interactive shell. Using Term::ReadLine::Perl as readline lib
+rary.
Type 'help' to get started.
ppm> search String::APprox
Searching in Active Repositories
1. String-Approx [3.25] Jarkko Hietaniemi (jhi@iki.fi) ActiveState P
+PM2 Repository [http://ppm
2. String-Approx [3.25] ActiveState P
+ackage Repository [http://
ppm> install 1
Package 1:
Note: Package 'String-Approx' is already installed.
====================
Install 'String-Approx' version 3.25 in ActivePerl 5.8.3.809.
====================
Downloaded 22757 bytes.
Extracting 9/9: blib/arch/auto/String/Approx/Approx.lib
Installing C:\Perl\site\lib\auto\String\Approx\Approx.dll
Installing C:\Perl\site\lib\auto\String\Approx\Approx.exp
Installing C:\Perl\site\lib\auto\String\Approx\Approx.lib
Installing C:\Perl\html\site\lib\String\Approx.html
Files found in blib\arch: installing files in blib\lib into architectu
+re dependent library tree
Installing C:\Perl\site\lib\String\Approx.pm
Successfully installed String-Approx version 3.25 in ActivePerl 5.8.3.
+809.
ppm> quit
P:\test>junk2
String::Approx::VERSION: 3.25
Using adist
donor_dedupe_source : 13
indicted_video_source : 15
source1 : 1
Using adistr
donor_dedupe_source : 0.684210526315789
indicted_video_source : 0.714285714285714
source1 : 0.142857142857143
One anomaly that shows up is that PPM reports as installing in "ActivePerl 5.8.3.809.", which I haven't used (or even had installed) for many months? I think this must be from a registry setting that hasn't been properly cleaned/overwritten, but I need to investigate that.
The version of perl used is 5.8.7.813:
P:\test>perl -v
This is perl, v5.8.7 built for MSWin32-x86-multi-thread
(with 7 registered patches, see perl -V for more detail)
Copyright 1987-2005, Larry Wall
Binary build 813 [148120] provided by ActiveState http://www.ActiveSta
+te.com
ActiveState is a division of Sophos.
Built Jun 6 2005 13:36:37
Perl may be copied only under the terms of either the Artistic License
+ or the
GNU General Public License, which may be found in the Perl 5 source ki
+t.
Complete documentation for Perl, including FAQ lists, should be found
+on
this system using `man perl' or `perldoc perl'. If you have access to
+ the
Internet, point your browser at http://www.perl.org/, the Perl Home Pa
+ge.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|