in reply to Improving speed match arrays with fuzzy logic

"... and so on"

Since the specification is fuzzy, let's make a fuzzy matching regex :)
Then match it against the whole corpus as a single string, instead of doing 500,000 individual matches.

#!/usr/bin/perl # https://perlmonks.org/?node_id=1228728 use strict; use warnings; # corpus is now a string instead of an array FIXME for real filename my $corpus = do { local (@ARGV, $/) = '/usr/share/dict/words'; <> }; # fake random input strings FIXME for real strings in @tomatch my @tomatch = map { join '', map { ('a'..'z')[rand 26] } 1 .. 4 } 1 .. + 1e2; for my $string (@tomatch) { my @patterns; # match <2 changes push @patterns, "$`.?$'" while $string =~ /\S/g; # changed or droppe +d char push @patterns, "$`.$'" while $string =~ /|/g; # added char $string =~ /^(.+)es$/ && push @patterns, $1; # singular my $fuzzyregex = do { local $" = '|'; qr/^(@patterns)$/m }; $corpus =~ $fuzzyregex && printf "%35s : %s\n", $string, $1; # FIXME + output }

Besides, I couldn't pass up an opportunity to write perl to write a regex :)

Replies are listed 'Best First'.
Re^2: Improving speed match arrays with fuzzy logic
by bliako (Abbot) on Jan 19, 2019 at 16:55 UTC

    Excellent! Since the worms are well out of the can (and pretty fast too!), can I add s|z, (f|ph) and add a plural case (rather than just making a singular)?

    ... # $string =~ /^(.+)es$/ && push @patterns, $1; # singular # Bliako modified: ($string =~ /^(.+?)(e?s)$/ && push @patterns, $1) # singular || push @patterns, $string.'(e?s)?' ; s/(?<![sz])(?:s|z)(?=[^\)\.])/(?:s|z)/ for(@patterns); s/f|ph/(:?f|ph)/ for(@patterns); #print "patterns:"; print " /$_/" for(@patterns); print "\n"; # end mods ...

      With a specification of "and so on" you can add anything you want :)