in reply to Re: Pattern searching allowing for mis-matches...
in thread Pattern searching allowing for mis-matches...

The only problem with that is the threshold is not exact, for example, if the threshold is instead .5, and the length of the query is three characters, I still come up with all the matches that have at least (1/3). I have tried fixing this to no avail, here is my attempt:
foreach my $key (keys %o) { my $keydisplay=1; my $target = $o{$key}; #my $threshold = 0.79; my $searchlength = length $search; my $truethreshold = int($searchlength * $threshold*100) /100; #(page46 +) if ($searchlength - $truethreshold >= .005) { $truethreshold += .01; } $searchlength = $truethreshold; for my $position (0 .. (length ($target) - $searchlength)) { my $test = substr $target, $position, $searchlength; my $matched = ($test ^ $search) =~ tr/\x00//; next if $matched < $truethreshold; print "\t\tFound <$test> at position $position which matches in $m +atched of $searchlength places\n"; } }

Replies are listed 'Best First'.
Re^3: Pattern searching allowing for mis-matches...
by GrandFather (Saint) on Dec 15, 2009 at 19:07 UTC

    Characters are discrete entities. What outcome do you expect from matching 1/2 a character?


    True laziness is hard work
      Right, maybe that was a poor example..... A threshold for .6 on a three character query still searches for 1/3, 2/3 or 3/3 identity. Similarly, a threshold of .7 on a 5 character search yields 3/5, 4/5 and 5/5 match. I would say I prefer to round it UP, rather than down, as by definition somethin shouldn't be below a threshold. I am just looking for a way to tighten it up.