in reply to Measuring Substrings Problem

Using vec leads to a rather elegant solution:
sub score { my ($str, $array) = @_; my $vec = ''; for (@$array) { my $idx = index $str, $_; # Set bits at each matched location vec($vec, $_, 1)= 1 for $idx..$idx+length($_)-1; } # Count set bits unpack '%32b*', $vec; }

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: Substring Distance Problem
by monkfan (Curate) on Apr 09, 2005 at 04:58 UTC
    Hi Roy,

    Your 'vec' approach is 5 times faster than our original approach!

    Rate bare vec bare 24142/s -- -78% vec 110542/s 358% --
    How can I accomodate the duplicate case (please see update above) to your snippet?

    i.e. Given: my @ar4 = ('GG','GG'); Returns: 4 not 2

    Update: Thanks to everybody for their great insights and helps.
    Regards,
    Edward

      If I may tinker with Roy's code, here is the fix you need. I give the modification for both the original and the xor versions of the algorithm:

      use strict; sub score { my ($str, $array) = @_; my $vec = ''; for (@$array) { my $ofs = 0; while ( ( my $idx = index $str, $_, $ofs ) > -1 ) { # Set bits at each matched location vec($vec, $_, 1)= 1 for $idx..$idx+length($_)-1; $ofs = $idx + 1; } } # Count set bits unpack '%32b*', $vec; } sub score_xor { my ($str, $array) = @_; my $vec = "\0" x length($str); for (@$array) { my $ofs = 0; while ( ( my $idx = index $str, $_, $ofs ) > -1 ) { # Matching substrings are padded into position with nulls $vec |= ("\0" x $idx) . $_; $ofs = $idx + 1; } } # Matching characters become nulls; others non-nulls $vec ^= $str; # Count nulls $vec =~ tr/\0//; }

      I was curious to see how these two versions compared, and was surprised to learn that the original one is faster by well over a factor of 2:

      Rate xor v1 xor 112734/s -- -59% v1 275692/s 145% --

      the lowliest monk

Re^2: Substring Distance Problem
by tlm (Prior) on Apr 08, 2005 at 16:53 UTC

    Good stuff. I think it (or the xor variant, or both) would be a good addition to the Snippets Section.

    the lowliest monk