in reply to Re: Substring Distance Problem
in thread Measuring Substrings Problem

Hi Roy,

Your 'vec' approach is 5 times faster than our original approach!

Rate bare vec bare 24142/s -- -78% vec 110542/s 358% --
How can I accomodate the duplicate case (please see update above) to your snippet?

i.e. Given: my @ar4 = ('GG','GG'); Returns: 4 not 2

Update: Thanks to everybody for their great insights and helps.
Regards,
Edward

Replies are listed 'Best First'.
Re^3: Substring Distance Problem
by tlm (Prior) on Apr 09, 2005 at 07:42 UTC

    If I may tinker with Roy's code, here is the fix you need. I give the modification for both the original and the xor versions of the algorithm:

    use strict; sub score { my ($str, $array) = @_; my $vec = ''; for (@$array) { my $ofs = 0; while ( ( my $idx = index $str, $_, $ofs ) > -1 ) { # Set bits at each matched location vec($vec, $_, 1)= 1 for $idx..$idx+length($_)-1; $ofs = $idx + 1; } } # Count set bits unpack '%32b*', $vec; } sub score_xor { my ($str, $array) = @_; my $vec = "\0" x length($str); for (@$array) { my $ofs = 0; while ( ( my $idx = index $str, $_, $ofs ) > -1 ) { # Matching substrings are padded into position with nulls $vec |= ("\0" x $idx) . $_; $ofs = $idx + 1; } } # Matching characters become nulls; others non-nulls $vec ^= $str; # Count nulls $vec =~ tr/\0//; }

    I was curious to see how these two versions compared, and was surprised to learn that the original one is faster by well over a factor of 2:

    Rate xor v1 xor 112734/s -- -59% v1 275692/s 145% --

    the lowliest monk