My solution builds a regexp that looks like the following, minus the lines breaks:
^ (?:a|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:p|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:p|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:l|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?:e|(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z)).) (?{ $fuzzies_left = $fuzzies }) $
Here it is:
our $USE_A_FUZZY = '(?(?{ local $fuzzies = $fuzzies; !$fuzzies-- })(?=A)(?=Z))'; our $COUNT_FUZZIES_LEFT = '(?{ $fuzzies_left = $fuzzies })'; { my $string = 'apple'; my $max_fuzzies = 2; my $re = '^'; $re .= sprintf("(?:%s|${USE_A_FUZZY}.)", quotemeta($1)) while ($string =~ /(.)/g); $re .= "${COUNT_FUZZIES_LEFT}\$"; $re = do { use re 'eval'; qr/$re/ }; our $fuzzies; local $fuzzies; our $fuzzies_left; local $fuzzies_left; while (<DATA>) { chomp; $fuzzies = $max_fuzzies; if (/$re/) { $fuzzies = $max_fuzzies - $fuzzies_left; if ($fuzzies) { print("Fuzzy match with $_ ($fuzzies fuzzies).$/"); } else { print("Exact match with $_.$/"); } } else { print("No match with $_.$/"); } } } __DATA__ apple arple brple brqle orange ------ OUTPUT ------ Exact match with apple. Fuzzy match with arple (1 fuzzies). Fuzzy match with brple (2 fuzzies). No match with brqle. No match with orange.
Don't know if it's faster.
Known Bugs: 'apples' doesn't count as a fuzzy match for 'apple', but it would count as a fuzzy match for 'apple '. I can fix that if you want.
Update: Since we never have any backtracking, we can simplify the above to:
^ (?:a|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:p|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:p|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:l|(?(?{!$fuzzies--})(?=A)(?=Z)).) (?:e|(?(?{!$fuzzies--})(?=A)(?=Z)).) $
As implemented by the following code:
our $USE_A_FUZZY = '(?(?{!$fuzzies--})(?=A)(?=Z))'; { my $string = 'apple'; my $max_fuzzies = 2; my $re = '^'; $re .= sprintf("(?:%s|${USE_A_FUZZY}.)", quotemeta($1)) while ($string =~ /(.)/g); $re .= "\$"; $re = do { use re 'eval'; qr/$re/ }; our $fuzzies; local $fuzzies; while (<DATA>) { chomp; $fuzzies = $max_fuzzies; if (/$re/) { $fuzzies = $max_fuzzies - $fuzzies; if ($fuzzies) { print("Fuzzy match with $_ ($fuzzies fuzzies).$/"); } else { print("Exact match with $_.$/"); } } else { print("No match with $_.$/"); } } }
In reply to Re: Fuzzy Searching: Optimizing Algorithm Selection
by ikegami
in thread Fuzzy Searching: Optimizing Algorithm Selection
by Itatsumaki
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |