"... and so on"
Since the specification is fuzzy, let's make a fuzzy matching regex :)
Then match it against the whole corpus as a single string, instead of doing 500,000 individual matches.
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1228728
use strict;
use warnings;
# corpus is now a string instead of an array FIXME for real filename
my $corpus = do { local (@ARGV, $/) = '/usr/share/dict/words'; <> };
# fake random input strings FIXME for real strings in @tomatch
my @tomatch = map { join '', map { ('a'..'z')[rand 26] } 1 .. 4 } 1 ..
+ 1e2;
for my $string (@tomatch)
{
my @patterns; # match <2 changes
push @patterns, "$`.?$'" while $string =~ /\S/g; # changed or droppe
+d char
push @patterns, "$`.$'" while $string =~ /|/g; # added char
$string =~ /^(.+)es$/ && push @patterns, $1; # singular
my $fuzzyregex = do { local $" = '|'; qr/^(@patterns)$/m };
$corpus =~ $fuzzyregex && printf "%35s : %s\n", $string, $1; # FIXME
+ output
}
Besides, I couldn't pass up an opportunity to write perl to write a regex :)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|