in reply to Fuzzy Searching: Optimizing Algorithm Selection
My solution builds a regexp that looks like the following, minus the lines breaks:
^ (?:a|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:p|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:p|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:l|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:e|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?{ $fuzzies_left = $fuzzies }) $
Here it is:
our $USE_A_FUZZY = '(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z))'; our $COUNT_FUZZIES_LEFT = '(?{ $fuzzies_left = $fuzzies })'; { my $string = 'apple'; my $max_fuzzies = 2; my $re = '^'; $re .= sprintf("(?:%s|${USE_A_FUZZY}.)", quotemeta($1)) while ($string =~ /(.)/g); $re .= "${COUNT_FUZZIES_LEFT}\$"; $re = do { use re 'eval'; qr/$re/ }; our $fuzzies; local $fuzzies; our $fuzzies_left; local $fuzzies_left; while (<DATA>) { chomp; $fuzzies = $max_fuzzies; if (/$re/) { $fuzzies = $max_fuzzies - $fuzzies_left; if ($fuzzies) { print("Fuzzy match with $_ ($fuzzies fuzzies).$/"); } else { print("Exact match with $_.$/"); } } else { print("No match with $_.$/"); } } } __DATA__ apple arple brple brqle orange ------ OUTPUT ------ Exact match with apple. Fuzzy match with arple (1 fuzzies). Fuzzy match with brple (2 fuzzies). No match with brqle. No match with orange.
Don't know if it's faster.
Known Bugs: 'apples' doesn't count as a fuzzy match for 'apple', but it would count as a fuzzy match for 'apple '. I can fix that if you want.
Update: Since we never have any backtracking, we can simplify the above to:
^ (?:a|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:p|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:p|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:l|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:e|(?(?{!$fuzzies--})(?=A)(?=Z)).) $
As implemented by the following code:
our $USE_A_FUZZY = '(?(?{!$fuzzies--})(?=A)(?=Z))'; { my $string = 'apple'; my $max_fuzzies = 2; my $re = '^'; $re .= sprintf("(?:%s|${USE_A_FUZZY}.)", quotemeta($1)) while ($string =~ /(.)/g); $re .= "\$"; $re = do { use re 'eval'; qr/$re/ }; our $fuzzies; local $fuzzies; while (<DATA>) { chomp; $fuzzies = $max_fuzzies; if (/$re/) { $fuzzies = $max_fuzzies - $fuzzies; if ($fuzzies) { print("Fuzzy match with $_ ($fuzzies fuzzies).$/"); } else { print("Exact match with $_.$/"); } } else { print("No match with $_.$/"); } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Fuzzy Searching: Optimizing Algorithm Selection
by tall_man (Parson) on Nov 11, 2004 at 00:32 UTC | |
by ikegami (Patriarch) on Nov 11, 2004 at 01:00 UTC |