in reply to A better implementation of LCSS?

BrowserUk, did you ever upload your code to CPAN?

In a related question the other day, ikegami posted a solution using String::LCSS_XS, which he later deleted. This led me to investigate what other modules which find longest common substrings are available on CPAN. Here is what I found:

  1. String::LCSS_XS
  2. Algorithm::LCSS
  3. String::LCSS
  4. Tree::Suffix

Note that there are other modules with similar names, but they relate to longest common subsequences.

String::LCSS_XS seems to be the best of the bunch. It has one reported bug, but the bug is simple to avoid, and there is even a potential patch.

The other 3 modules have reported functional bugs for which there are no specified workarounds or patches.

Algorithm::LCSS was last updated in 2003 (which was a magical year for LCSS modules, apparently). The author's last activity on CPAN (for other modules) was in 2008. Soon after that, 2 bugs were filed, but the author never responded to either one.

The POD for Tree::Suffix indicates that the author has ceased to maintain the module due to numerous bugs in an external dependency.

Replies are listed 'Best First'.
Re^2: A better implementation of LCSS?
by BrowserUk (Patriarch) on Nov 11, 2015 at 21:07 UTC

    No. I never have.

    As has often been the case, my immediate needs were satisfied and some other project came along that demanded my time (whether for financial or intellectual reasons) and I've never been back to look at it.

    If the code in this thread still stands up, and anyone is interested in packaging it, they have both my blessing and support.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I adapted your code to match the user interface of String::LCSS, and I uploaded it as a patch: https://rt.cpan.org/Ticket/Display.html?id=32036

      I also patched the test to prove that your code fixes the reported functional bugs. Perhaps this will lower the barrier for someone to upload a new version of this module to CPAN.

      UPDATE: I sent the module author an email, offering my services as co-maintainer. Waiting for a response...

Re^2: A better implementation of LCSS?
by ikegami (Patriarch) on Nov 16, 2015 at 21:04 UTC

    Algorithm::LCSS's documentation says it finds the longest subsequence (axaxaxa + ayayaya = aaaaa), not the longest substring, but then it compares itself to String:LCSS?!? but it does indeed find the longest substring like String::LCSS.

      Good point. When I first read through the POD, I mistakenly thought the terms "substring" and "subsequence" were synonymous. After playing around with the *::LCS modules, I realized there is a huge distinction between the terms. Perhaps the module author was under the same misconception. I agree with you that the Algorithm::LCSS POD should be clarified. UPDATE: I see you've logged a bug report.
Re^2: A better implementation of LCSS? (Memoize)
by toolic (Bishop) on Nov 18, 2015 at 21:02 UTC
    UPDATE: ahh, nevermind. BrowserUk just debunked this...


    Somewhat related...

    For what it's worth, I used Memoize on the String::LCSS::lcss sub, and the increase in performance is huge. In fact, String::LCSS is faster than String::LCSS_XS.

    The String::LCSS_XS POD shows these Benchmark results (which I was able to reproduce):

    Rate String::LCSS String::LCSS_XS String::LCSS 60.9/s -- -100% String::LCSS_XS 84746/s 138966% --

    Here are the results with Memoize:

    String::LCSS version = 0.12 String::LCSS_XS version = 1.2 >>>the quick brown fox <<< >>>the quick brown fox <<< Rate LCSS_XS LCSS LCSS_XS 195695/s -- -27% LCSS 268817/s 37% --

    Here is the code to run it:

    Keep in mind that String::LCSS has critical bugs.

      For what it's worth, I used Memoize on the String::LCSS::lcss sub, and the increase in performance is huge. In fact, String::LCSS is faster than String::LCSS_XS.

      Sorry, but that is a useless test. You are always testing the same two strings, which means that you are simply getting back the same result each time after the first, without having to re-perform the algorithm.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
        OK, I guess I'm clueless in the ways of benchmarking. Do you think the results in the String::LCSS_XS POD are bogus, too? Unfortunately, I don't have the code for that benchmark. Can someone point me to a way to generate a meaningful test?