in reply to Re: Search for identical substrings
in thread Search for identical substrings
If I'm reading the information right for Algorithm::Diff it will not return what I want. LCS appears to use a "distance" measure. That is it determines the distance over which two strings have the most information in common. This is done by computing hits and misses. The maximum hit count will give the longest distance over which the two strings share commonality. Usually these types of algorithms have a penalty for misses. Nonetheless as I understand LCS, if we are using the strings "banana is split" and "bananas split" we can line up the strings a couple of ways.
banana is split bananas split banana--s # 7 characters in common and two misses "-" or banana is split bananas split ana--s split # 10 characters in common and two misses
Allowing the strings to flex by putting holes in the strings we get...
banana is split banana..s split # 13 characters in common and two "holes"
...by putting two holes between "a" and "s" in "bananas"
So "bananas split" (removing the holes from "banana..s split") would be the result of LCS, but I want "s split" with a hit count of 7, no misses and no holes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Search for identical substrings
by BrowserUk (Patriarch) on Aug 19, 2005 at 00:54 UTC | |
by bioMan (Beadle) on Aug 19, 2005 at 16:50 UTC | |
|
Re^3: Search for identical substrings
by graff (Chancellor) on Aug 19, 2005 at 00:09 UTC |