I'm just trying an algorithm that does a fuzzy search for a pattern that
Tachyon originally posted about a bioinformatics question. It is currently outputting the line, number of misses and I've got it to print the sentence that the occurrence appears in but I'm trying to get the word as well and I'd be grateful for some help. I'm just experimenting in applying some bioinformatics to text analysis (following a conversation with an acquaintance) and will be looking at using stop words, inflections and corpora in due course.
use strict;
use warnings;
my $word = "scrooge";
my @find = map ([split //], $word);
my $find_len = length($word);
my $fuzzy = 2;
while (my $search = <DATA>) {
chomp $search;
$search = [split //, $search];
for my $i ( 0..@$search-$find_len ) {
FIND:
for my $find ( @find ) {
my $misses = 0;
for $j ( 0..$find_len-1 ) {
$misses++ if $search->[$i+$j] ne $find->[$j];
next FIND if $misses > $fuzzy;
}
print "Line $. Match ($misses) at $i, @$search\n";
}
}
}
__DATA__
STAVE I: MARLEY'S GHOST
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
signed by the clergyman, the clerk, the undertaker,
and the chief mourner. Scrouge signed it: and
Scrooge's name was good upon 'Change, for anything he
chose to put his hand to. Old Marley was as dead as a
door-nail.
Mind! I don't mean to say that I know, of my
own knowledge, what there is particularly dead about
a door-nail. I might have been inclined, myself, to
regard a coffin-nail as the deadest piece of ironmongery
in the trade. But the wisdom of our ancestors
is in the simile; and my unhallowed hands
shall not disturb it, or the Country's done for. You
will therefore permit me to repeat, emphatically, that
Marley was as dead as a door-nail
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.