nothing in String::Approx's docs leads me to believe that values from two separate runs of adistr() can be reliably compared.
According to the docs, the
The measure of approximateness is the Levenshtein edit distance. It is the total number of "edits": insertions, deletions and substitutions required to transform a string to another string. For example, to transform "lead" into "gold", you need three edits: The edit distance of "lead" and "gold" is therefore three, or 75%.
Which means that by padding the shorter value, you are getting the Levenshtein distances, converted to a ratio. Provided that one of each pair is the same string, the values should be comparable.
This is the output from the tests I ran, with an additional check of the version:
P:\test>p1 perl> use String::Approx qw[ adist ];; perl> @inputs = qw(donor_dedupe_source indicted_video_source source1); +; perl> print "$_ : ", adist( pack( 'A'.length, 'source' ), $_ ) for @in +puts;; donor_dedupe_source : 13 indicted_video_source : 15 source1 : 1 perl> Terminating on signal SIGINT(2) P:\test>p1 perl> use String::Approx qw[ adistr ];; perl> @inputs = qw(donor_dedupe_source indicted_video_source source1); +; perl> print "$_ : ", adistr( pack( 'A'.length, 'source' ), $_ ) for @i +nputs;; donor_dedupe_source : 13 indicted_video_source : 15 source1 : 1 perl> print String::Approx::VERSION;; print() on unopened filehandle VERSION at (eval 6) line 1, <STDIN> lin +e 4. perl> print $String::Approx::VERSION;; 3.19
On the basis that I have seen an occasional quirk of running stuff via my perl shell, I've just run the following script:
#! perl -slw use strict; use String::Approx qw[ adist adistr ]; print "String::Approx::VERSION: ", $String::Approx::VERSION; my @inputs = qw(donor_dedupe_source indicted_video_source source1); print "\nUsing adist"; print "$_ : ", adist( pack( 'A'.length, 'source' ), $_ ) for @inputs; print "\nUsing adistr"; print "$_ : ", adistr( pack( 'A'.length, 'source' ), $_ ) for @inputs; __END__ P:\test>junk2 String::Approx::VERSION: 3.19 Using adist donor_dedupe_source : 13 indicted_video_source : 15 source1 : 1 Using adistr donor_dedupe_source : 13 indicted_video_source : 15 source1 : 1
Which shows no difference.
Now I will re-install String::Approx from PPM and try again, and then I get the cpan copy and build that.
Update: It would appear that adistr() changed between versions 3.19 that I've had forever and and 3.25:
P:\test>ppm PPM - Programmer's Package Manager version 3.2. Copyright (c) 2001 ActiveState Corp. All Rights Reserved. ActiveState is a division of Sophos. Entering interactive shell. Using Term::ReadLine::Perl as readline lib +rary. Type 'help' to get started. ppm> search String::APprox Searching in Active Repositories 1. String-Approx [3.25] Jarkko Hietaniemi (jhi@iki.fi) ActiveState P +PM2 Repository [http://ppm 2. String-Approx [3.25] ActiveState P +ackage Repository [http:// ppm> install 1 Package 1: Note: Package 'String-Approx' is already installed. ==================== Install 'String-Approx' version 3.25 in ActivePerl 5.8.3.809. ==================== Downloaded 22757 bytes. Extracting 9/9: blib/arch/auto/String/Approx/Approx.lib Installing C:\Perl\site\lib\auto\String\Approx\Approx.dll Installing C:\Perl\site\lib\auto\String\Approx\Approx.exp Installing C:\Perl\site\lib\auto\String\Approx\Approx.lib Installing C:\Perl\html\site\lib\String\Approx.html Files found in blib\arch: installing files in blib\lib into architectu +re dependent library tree Installing C:\Perl\site\lib\String\Approx.pm Successfully installed String-Approx version 3.25 in ActivePerl 5.8.3. +809. ppm> quit P:\test>junk2 String::Approx::VERSION: 3.25 Using adist donor_dedupe_source : 13 indicted_video_source : 15 source1 : 1 Using adistr donor_dedupe_source : 0.684210526315789 indicted_video_source : 0.714285714285714 source1 : 0.142857142857143
One anomaly that shows up is that PPM reports as installing in "ActivePerl 5.8.3.809.", which I haven't used (or even had installed) for many months? I think this must be from a registry setting that hasn't been properly cleaned/overwritten, but I need to investigate that.
The version of perl used is 5.8.7.813:
P:\test>perl -v This is perl, v5.8.7 built for MSWin32-x86-multi-thread (with 7 registered patches, see perl -V for more detail) Copyright 1987-2005, Larry Wall Binary build 813 [148120] provided by ActiveState http://www.ActiveSta +te.com ActiveState is a division of Sophos. Built Jun 6 2005 13:36:37 Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge.
In reply to Re^3: Tuning an approximate match to prefer closer lengths
by BrowserUk
in thread Tuning an approximate match to prefer closer lengths
by samtregar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |