Re: generating hash patterns for searching with one mismatch

1) maybe Text::Levenshtein will help you here; it calculates the Levenshtein_distance for you, which would be 1 in your case. Possibly the Hamming_distance can also help you here...

2) efficiency is hard to define; even if you use a perl-idiom like map, there will be still an underlying loop; and using fewer keystrokes in your perl-program often reduces readability - especially if you use tricks/constructs you are not used to read ... so in my eyes efficiency is depending on your level of knowledge (and as that level is likely to increase, your rating what's efficient and what's not is likely to change)

HTH, Rata

Update:

educated_foo provided a comment to this node (thanks!), so I think a bit of clarification is helpful:

I thought about using the Levenshtein-Distance the following way:

my $len = length($pattern);
for (my $i = 0; i < lenght($str)-$len; $i++)  {
    if (distance( substr ($str, $i, $len), $pattern) < 2)  { print "si
+milar"; }
}
[download]

That way you would get the similarity, and since the length of the substring and the length of the pattern are the same, inserts/deletes are no issue.

However since the further discussion showed that cedance wants a high-performance solution, I would not recommend that approach any longer... it is nice, elegant and slow ;-)

Regarding my comment on efficiency: seems I was misleaded by the wording perl syntaxes enables us to write shorter than say C/C++. It has been clarified later on in the thread ... seems I was victim of an XY Problem here.

Comment on Re: generating hash patterns for searching with one mismatch Select or Download Code

Replies are listed 'Best First'.
Re^2: generating hash patterns for searching with one mismatch by educated_foo (Vicar) on Mar 17, 2011 at 15:54 UTC
1) maybe Text::Levenshtein will help you here; it calculates the Levenshtein_distance for you, Which is nothing like what he wants, since it allows indels. He wants hamming distance, so T::L would do the wrong thing. efficiency is hard to define; No it isn't: he probably wants to match many short strings against a long one (like a chromosome, with millions of characters) as quickly as possible.	[reply]