Dear monks,

I try to develop a faster Diff/LCS than Algorithm::Diff::XS or LCS::BV.

My first approach with two strings to C looked promising with rates of 400 KHz for length ~10 (result 7 x 2 array). This is 8 times the speed of A::D::XS. But adding the composition of the resulting alignment as Array of Arrays slowed down the rate to 190 KHz. A more extreme case s1 = a..zA..Y, s2 = b..zA..Z, with only the first and last element different, drops down from 333 KHz to 60 KHz. That's because the result is a 50 x 2 array.

Here is the suspiciously slow part (see also on CPAN module LCS::XS or github.com/wollmers/LCS-XS):

void lcs_LCS(obj, s1, s2) SV *obj AV * s1 AV * s2 PREINIT: struct CTX *ctx = (struct CTX *)SvIVX(SvRV(obj)); PPCODE: int d, sn, i; struct varray *ses = varray_new(sizeof(struct diff_edit), NULL +); IV n; IV m; n = av_top_index(s1); m = av_top_index(s2); d = diff(s1, 0, n+1, s2, 0, m+1, &_cmp_idx, NULL, 0, ses, &sn +, NULL); int x,y,j; x=y=0; XSprePUSH; for (i = 0; i < sn; i++) { struct diff_edit *e = varray_get(ses, i); switch (e->op) { case DIFF_MATCH: //printf("MAT: "); //printf("off %d len %d\n", e->off, e->len); for (j = 0; j < e->len; j++) { //printf("x %d y %d\n", x, y); AV *arr; arr = newAV(); av_push(arr, newSViv(x)); av_push(arr, newSViv(y)); XPUSHs(sv_2mortal(newRV_noinc((SV *)arr))); x++; y++; } break; case DIFF_DELETE: //printf("DEL: "); //printf("off %d len %d\n", e->off, e->len); for (j = 0; j < e->len; j++) { x++; } break; case DIFF_INSERT: //printf("INS: "); //printf("off %d len %d\n", e->off, e->len); for (j = 0; j < e->len; j++) { y++; } break; } } varray_del(ses);

In reply to Faster creation of Arrays in XS? by wollmers

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.