Re^5: Fast common substring matching

in reply to Re^4: Fast common substring matching
in thread Fast common substring matching

Update: The update to MCE::Shared::Sequence in trunk allows MCE::Hobo workers to run as fast as MCE workers. Thank you for this. The MCE::Hobo demonstration made me realized the need to beef up MCE::Shared::Sequence with chunk_size and bounds_only options similar to MCE options.

Using Roy Johnson's demonstration, made the following changes to enable parallelism via MCE::Loop.

...

print "Sorted. Finding matches...\n";

use MCE::Loop;

MCE::Loop::init(
   max_workers => 8,
   chunk_size  => 500,
   bounds_only => 1,
);

my @ret = mce_loop_s {
  my ( $mce, $seq, $chunk_id ) = @_;
  my @matchdata = (0); # (length, index1-into-strings, index2-into-str
+ings)

  for my $i1 ( $seq->[0] .. $seq->[1] ) {
    my $i2 = $i1 + 1;
    ++$i2 while $i2 <= $#strings and $strings[$i2][1] eq $strings[$i1]
+[1];
    next if $i2 > $#strings;
    my ($common) = map length, ($strings[$i1][0] ^ $strings[$i2][0]) =
+~ /^(\0*)/;
    if ($common > $matchdata[0]) {
      @matchdata = ($common, [$i1, $i2]);
    }
    elsif ($common == $matchdata[0]) {
      push @matchdata, [$i1, $i2];
    }
  }

  MCE->gather( \@matchdata );

} 0, $#strings - 1;

my @matchdata = (0); # (length, index1-into-strings, index2-into-strin
+gs)

for my $i ( 0 .. $#ret ) {
  if ( $ret[$i]->[0] > $matchdata[0] ) {
    @matchdata = @{ $ret[$i] };
  }
  elsif ( $ret[$i]->[0] == $matchdata[0] ) {
    shift @{ $ret[$i] };
    push  @matchdata, @{ $ret[$i] };
  }
}

print "Best match: $matchdata[0] chars\n";

...
[download]

Comment on Re^5: Fast common substring matching Download Code

In Section Cool Uses for Perl