in reply to Re: Does String::LCSS work?
in thread Does String::LCSS work?

String::LCSS_XS has issues too.

It has an undocumented limitation: It only works on strings of bytes.

It has a bug: It only works when the input strings are stored in the UTF8=0 format.

(Going from memory, but a quick check seems to confirm the above.)

If you're ok with the limitation, the workaround for the bug is to call utf8::downgrade the inputs before calling the function.

An alternative is Algorithm::Diff. It's LCS functions also find the longuest common subsequence. I don't know much about the module. [That's something different.]

Replies are listed 'Best First'.
Re^3: Does String::LCSS work?
by ikegami (Patriarch) on Feb 08, 2010 at 19:25 UTC

    The bug has been fixed and the limitations have been removed. New String::LCSS_XS 1.1 can operate on strings in either internal format (UTF8=0 and UTF8=1), it work with any string (not just those with chars <256), and strings containing byte 00 are now acceptable.

    If both strings only contain bytes, you'll get optimal performance by making sure they are downgraded (UTF8=0 format).

Re^3: Does String::LCSS work?
by lima1 (Curate) on Jan 28, 2010 at 16:16 UTC
    Thanks. I'll try to find some time to fix this next week. I'll just have to change the XS code which iterates over the strings, right?

    Update: 1.1 supports now UTF8.

      Fixing the bug or the limitation? Fixing the bug is simple:

      --- LCSS_XS.xs.orig 2010-01-28 14:00:25.000000000 -0500 +++ LCSS_XS.xs.new 2010-01-28 14:03:45.000000000 -0500 @@ -10,8 +10,8 @@ void _compute_all_lcss(s, t) - char * s - char * t + SV * s + SV * t PROTOTYPE: $$ ALIAS: lcss = 1 @@ -21,7 +21,7 @@ int i; AV * ra; PPCODE: - res = _lcss(s,t); + res = _lcss( SvPVbyte_nolen(s), SvPVbyte_nolen(t) ); if (res.n <= 0) { _free_res(res); XSRETURN_UNDEF;

      It turns out there are two limitations:

      • It can only compare strings of bytes
      • It can only compare strings that don't contain byte 00. (It and anything after it is ignored.)