in reply to Does String::LCSS work?

I used String::LCSS_XS instead. This works:
#!/usr/bin/perl use strict; use warnings; use String::LCSS_XS qw(lcss); my @result = lcss( 'abcdefghixypqrstxyzuvw', 'axyza'); print "$result[0]\n";

Replies are listed 'Best First'.
Re^2: Does String::LCSS work?
by ikegami (Patriarch) on Jan 25, 2010 at 08:55 UTC

    String::LCSS_XS has issues too.

    It has an undocumented limitation: It only works on strings of bytes.

    It has a bug: It only works when the input strings are stored in the UTF8=0 format.

    (Going from memory, but a quick check seems to confirm the above.)

    If you're ok with the limitation, the workaround for the bug is to call utf8::downgrade the inputs before calling the function.

    An alternative is Algorithm::Diff. It's LCS functions also find the longuest common subsequence. I don't know much about the module. [That's something different.]

      The bug has been fixed and the limitations have been removed. New String::LCSS_XS 1.1 can operate on strings in either internal format (UTF8=0 and UTF8=1), it work with any string (not just those with chars <256), and strings containing byte 00 are now acceptable.

      If both strings only contain bytes, you'll get optimal performance by making sure they are downgraded (UTF8=0 format).

      Thanks. I'll try to find some time to fix this next week. I'll just have to change the XS code which iterates over the strings, right?

      Update: 1.1 supports now UTF8.

        Fixing the bug or the limitation? Fixing the bug is simple:

        --- LCSS_XS.xs.orig 2010-01-28 14:00:25.000000000 -0500 +++ LCSS_XS.xs.new 2010-01-28 14:03:45.000000000 -0500 @@ -10,8 +10,8 @@ void _compute_all_lcss(s, t) - char * s - char * t + SV * s + SV * t PROTOTYPE: $$ ALIAS: lcss = 1 @@ -21,7 +21,7 @@ int i; AV * ra; PPCODE: - res = _lcss(s,t); + res = _lcss( SvPVbyte_nolen(s), SvPVbyte_nolen(t) ); if (res.n <= 0) { _free_res(res); XSRETURN_UNDEF;

        It turns out there are two limitations:

        • It can only compare strings of bytes
        • It can only compare strings that don't contain byte 00. (It and anything after it is ignored.)
Re^2: Does String::LCSS work?
by BrowserUk (Patriarch) on Jan 25, 2010 at 07:12 UTC
    I used String::LCSS_XS instead.

    Thanks.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.