in reply to Re^2: Fastest way to find the mismatch character
in thread Fastest way to find the mismatch character

Yeah, you're right. I like String::Diff, but it finds mismatches differently from the OP's spec. Look at the output from a simple snippet like this:

say String::Diff::diff_merge( @strings );

Using your strings you'll see that String::Diff finds the mismatches, but it actually finds them in slightly different places than the OP was requesting. For example, it doesn't find that with "This vs Thsi" the "is" and "si" are transposed. Instead, it finds that there is an 'i' in front of an 's' in the first string where it doesn't exist in the second string, and then there's an 'i' in the second string following the 's' where it doesn't appear in the first string. So its output would be something like "Th{i}s[i]", with a diff_merge. I think that's actually more useful information; it's more actionable. It tells me where to insert and where to delete to fix the differences. But to coerce that sort of style into the OP's specification I took a shortcut.

The shortcut started out like this: Override the default bracketing system so that there's only a start token at the beginning of trouble, and an end token after the end of trouble, but no tokens in the middle of the trouble. That would give something like this: <mm>isi</mm>.

Obviously that misses the mark by keeping two instances of what Diff thinks are differences; the 'i' that is followed by an 's', and the 'i' that comes after an 's'. So my first dirty solution was to add into the mixture a backspace character right in front of the closing tag. On some terminals that character would gobble up the trailing 'i'.

I was unsatisfied with only messing with terminal output, so my next step was to use the backspace character as a trigger in a substitution to actually s/// out of the string the 'bs' character, along with the one that came right before it. This had the literal effect that had previously been a terminal virtual effect. I stuck with the 'bs' character because it seemed unlikely that it would just appear in the target text to begin with.

I wasn't done yet. My next realization was that although the OP's spec only showed single-digit transpositions, if I detected the length of the mismatch, I could substitute away more than just the single trailing character. I could substitute away the entire trailing part of the mismatch.

But now we've really got our hands dirty. This wasn't what the String::Diff module was designed to do, and was bound to give us fits if the differences became just a little more complex, which you pointed out. This is where a good suite of tests would have immediately found the flaw, and this is why test driven development has so much merit. I left an undetected bug in sample code overnight. Here in composing responses to questions there's not really time to create a test suite. But had it been pushed into production without testing it might have caused grief.

Three books that got me started with understanding how to conduct testing in Perl were, "The Definitive Guide to Catalyst" (Not a great source on testing, but it got me interested). "Intermediate Perl" (I had to upgrade from the old version, "Learning Perl Objects References and Modules" to get the course on testing), and "Perl Testing: A Developer's Notebook". Of those, I think anyone who cares about testing their code ought to at least pick up Intermediate Perl. The "Developer's Notebook" covers even more depth, and is also a good resource.


Dave