BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Does it work for you? Am I doing something wrong?

perl -MString::LCSS=lcss -E" say lcss( 'abcdefghixypqrstxyzuvw', 'axyz +a' );" 00 perl -MString::LCSS=lcss -E"say scalar lcss( 'abcdefghixypqrstxyzuvw', +'axyza' );"

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re: Does String::LCSS work?
by stefbv (Priest) on Jan 25, 2010 at 06:50 UTC

    No, it doesn't work, ikegami made a bug report for it on May 2008

    Maybe it's meant to work only with the strings from the SYNOPSYS :)

      No, it does not work. Avoid it like the plague.

      It does not even work for the examples provided (i.e. fails to find the sub-STRING, zyz). It cannot find substrings let alone sub-sequences.

      Both Lima One (author of String::LCSS_XS) and I have a versions of working LCSS code that either of us would like to donate to the namespace String::LCSS. But Daniel Yacob has the claim to the namesapce and he disappeared more than 5 years ago.

      Is there a way to invoke the Dutch legal concept of the "law of the shovel" here? If you do not put in the work to the maintain the levy which keeps your land dry you may lose claim to the protected land to the guy who is actually maintaining the levy.

      Can we consider String::LCSS to be "abandoned" and therfore the name space should be assigned to the person (or people) willing to work in and maintain that namespace?
Re: Does String::LCSS work?
by Khen1950fx (Canon) on Jan 25, 2010 at 07:02 UTC
    I used String::LCSS_XS instead. This works:
    #!/usr/bin/perl use strict; use warnings; use String::LCSS_XS qw(lcss); my @result = lcss( 'abcdefghixypqrstxyzuvw', 'axyza'); print "$result[0]\n";

      String::LCSS_XS has issues too.

      It has an undocumented limitation: It only works on strings of bytes.

      It has a bug: It only works when the input strings are stored in the UTF8=0 format.

      (Going from memory, but a quick check seems to confirm the above.)

      If you're ok with the limitation, the workaround for the bug is to call utf8::downgrade the inputs before calling the function.

      An alternative is Algorithm::Diff. It's LCS functions also find the longuest common subsequence. I don't know much about the module. [That's something different.]

        The bug has been fixed and the limitations have been removed. New String::LCSS_XS 1.1 can operate on strings in either internal format (UTF8=0 and UTF8=1), it work with any string (not just those with chars <256), and strings containing byte 00 are now acceptable.

        If both strings only contain bytes, you'll get optimal performance by making sure they are downgraded (UTF8=0 format).

        Thanks. I'll try to find some time to fix this next week. I'll just have to change the XS code which iterates over the strings, right?

        Update: 1.1 supports now UTF8.

      I used String::LCSS_XS instead.

      Thanks.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Does String::LCSS work?
by repellent (Priest) on Jan 27, 2010 at 04:00 UTC

      Trust me, that's not fast. Using that script on 100 strings takes 76 seconds and 90MB:

      c:\test>junk90 junk90.dat Loaded. Generating combos... 000001 and 000002: 127 chars ... starting at 37 and 872, respectively. 000008 and 000089: 10 chars ... starting at 550 and 355, respectively. 000040 and 000081: 11 chars ... starting at 219 and 623, respectively. 000046 and 000056: 12 chars ... starting at 808 and 845, respectively. 000058 and 000069: 11 chars ... starting at 837 and 276, respectively. Best overall match: 127 chars 000002:872 and 000001:37 Completed in 76.985

      Using String::LCSS_XS it takes 15 seconds and 5MB:

      c:\test>LCSS10 junk90.dat 000001(37) and 000002(872): 127 '5808821137152553645216516684787076304 +368738347768274782252043367265484547586755564151615422250715355234473 +558428710868782135070' 000008(550) and 000089(355): 10 '3252367176' 000040(219) and 000081(623): 11 '61341721171' 000046(808) and 000056(845): 12 '876526361506' 000058(837) and 000069(276): 11 '00666788082' Took: 14.594 seconds

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        OK, I'll take your word that the performance isn't that great for practical use. It looks like the speed was as advertised though (100 strings of 1000 chars takes about 4 minutes -- in the order of minutes).

        Just glancing at the code, it seems that it doesn't suffer from the limitations of String::LCSS_XS pointed out by ikegami?