in reply to Search for identical substrings
Well, a lot really hinges on what you mean, exactly, when you say:
...search for the longest matching substrings between the members of the array...
If what you mean is something like this:
then Algorithm::Diff can be used to arrive at that answer for each pair-wise string comarison as follows:$string[0] = "abc fghijklmn "; $string[1] = " b d fgh jklmno"; # correct answer is: "jklmn", 5 chars long; # (next longest common substring is " fgh", # but you're not interested in that)
If you take out the comment marks on the "warn" statements in the two callbacks, you'll be able to watch what's going on. With those commented out, a file of 382 lines (between 2 and 73 characters per line, about 73K pair-wise comparisons) was finished in about 6 minutes.#!/usr/bin/perl use strict; use Algorithm::Diff qw/traverse_sequences/; my @strings = <>; chomp @strings; my ( $longestMatch, $currentMatch ); my ( @seq1, @seq2 ); for my $i ( 0 .. $#strings-1 ) { @seq1 = split //, $strings[$i]; for my $j ( $i+1 .. $#strings ) { @seq2 = split //, $strings[$j]; $longestMatch = $currentMatch = ""; traverse_sequences( \@seq1, \@seq2, { MATCH => \&add2match, DISCARD_A => \&end_match, DISCARD_B => \&end_match } ); ### update: add a direct call to "end_match()" here, ### in case traverse was still matching at end-of-string: end_match( 'EOS' ); print "LCS for $i :: $j = |$longestMatch|\n"; } } sub add2match { my ( $ptrA, $ptrB ) = @_; $currentMatch .= $seq1[$ptrA]; # warn "match at i=$ptrA, j=$ptrB : =$seq1[$ptrA]= ; cm=$currentMat +ch=\n"; } sub end_match { $longestMatch = $currentMatch if ( length( $longestMatch ) < length( $currentMatch )); $currentMatch = ""; # warn "match ended at @_ : lm=$longestMatch=\n"; }
I expect there are cleaner ways to do it (e.g. that don't involve global-scope variables), but this was fairly quick and easy (fun, even) for a first-time A::D user.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Search for identical substrings
by bioMan (Beadle) on Aug 18, 2005 at 22:03 UTC | |
by BrowserUk (Patriarch) on Aug 19, 2005 at 00:54 UTC | |
by bioMan (Beadle) on Aug 19, 2005 at 16:50 UTC | |
by graff (Chancellor) on Aug 19, 2005 at 00:09 UTC |