in reply to Re: A better implementation of LCSS? (bug?)
in thread A better implementation of LCSS?
Yes. You found a bug.
A simpler example is 'abcdefg' & 'abcdefga'.
What happens is this. To speed up the processing, the code xors the longer input with a string that contain the shorter string replicated until is is longer than the longer string.
Ie. if you have 'the quick brown fox' & 'brown', the shorter is replicated and xored with the longer like so:
the quick brown fox brownbrownbrownbrown ..........00000.....
Then the xored result is scanned looking for contiguous runs of zeros the length of the shorter string. In this case '00000'.
In your case and my example above, the process of replicating the shorter string creates false matches:
xxxyyxxy yyyxyxxyyyxyxx ....0000...... False match abcdefga abcdefgabcdefg 00000000...... False match
Which makes it amazing to me that the guys I originally wrote the code for have never come back to me. I'm not sure I even know how to contact them again.
The obvious solution is to throw away this 'optimisation' and use another nested loop; at which point the performance gain that was the code's raison detre probably disappears :(
A first pass at not throwing away the performance gain is this:
sub lcssN (\$\$;$) { my( $ref1, $ref2, $min ) = @_; my( $swapped, $l1, $l2 ) = ( 0, map length( $$_ ), $ref1, $ref2 ); ( $l2, $ref2, $l1, $ref1, $swapped ) = ( $l1, $ref1, $l2, $ref2, 1 + ) if $l1 > $l2; $min = 1 unless defined $min; my $mask = $$ref1 x ( int( $l2 / $l1 ) + 1 ); my @match = ''; for my $start ( 0 .. $l1-1 ) { my $masked = substr( $mask, $start, $l2 ) ^ $$ref2; while( $masked =~ m[\0{$min,}]go ) { my $l = $+[ 0 ] - $-[ 0 ]; my $match = substr( $$ref2, $-[ 0 ], $l ); next unless 1+index $$ref1, $match; @match = ( $match, ( $-[ 0 ]+$start ) % $l1, $-[ 0 ] ) if $l > length $match[ 0 ]; } } @match[ 2, 1 ] = @match[ 1, 2 ] if $swapped; return unless $match[ 0 ]; return wantarray ? @match : $match[ 0 ]; }
I haven't tested what affect that has on performance.
Update:This also works for the example you posted but I haven't convinced myself that it won't fail for other inputs yet:
sub lcssN (\$\$;$) { my( $ref1, $ref2, $min ) = @_; my( $swapped, $l1, $l2 ) = ( 0, map length( $$_ ), $ref1, $ref2 ); ( $l2, $ref2, $l1, $ref1, $swapped ) = ( $l1, $ref1, $l2, $ref2, 1 + ) if $l1 > $l2; $min = 1 unless defined $min; my $mask = $$ref1 x ( int( $l2 / $l1 ) ); my @match = ''; for my $start ( 0 .. $l1-1 ) { my $masked = substr( $mask, $start, $l2 ) ^ $$ref2; while( $masked =~ m[\0{$min,}]go ) { my $l = $+[ 0 ] - $-[ 0 ]; my $match = substr( $$ref2, $-[ 0 ], $l ); @match = ( $match, ( $-[ 0 ]+$start ) % $l1, $-[ 0 ] ) if $l > length $match[ 0 ]; } } @match[ 2, 1 ] = @match[ 1, 2 ] if $swapped; return unless $match[ 0 ]; return wantarray ? @match : $match[ 0 ]; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: A better implementation of LCSS? (testing ... 1,2,3)
by toolic (Bishop) on Nov 14, 2015 at 03:32 UTC | |
by BrowserUk (Patriarch) on Nov 14, 2015 at 09:44 UTC | |
by toolic (Bishop) on Nov 15, 2015 at 14:08 UTC | |
by BrowserUk (Patriarch) on Nov 16, 2015 at 16:20 UTC | |
by toolic (Bishop) on Nov 16, 2015 at 16:43 UTC | |
|