OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:

#! perl use strict; use warnings; use Algorithm::Diff qw( LCS ); use constant { CASE_SENSITIVE => 0, DICTIONARY_FILE => 'words.txt', MINIMUM_MATCH => 2, }; print "Enter the target word: "; chomp(my $target = <STDIN>); my @target = split //, CASE_SENSITIVE ? $target : lc $target; open(my $in, '<', DICTIONARY_FILE) or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $! +"; my %substrings; while (my $word = <$in>) { chomp $word; my @word = split //, CASE_SENSITIVE ? $word : lc $word; my @lcs = LCS(\@target, \@word); $substrings{ join('', @lcs) } = $word if @lcs >= MINIMUM_MATCH; } close $in or die "Cannot close file '" . DICTIONARY_FILE . "': $!"; if (%substrings) { my @matches = sort { length $a <=> length $b } keys %substrings; print "The closest match is: ", $substrings{ $matches[-1] }, "\n" +; } else { print "No matches found\n"; }

With “words.txt” containing:

fal falle fall awiSayaM awiSayanZ

and suitable input from the keyboard, the output is:

19:42 >perl 454_SoPW.pl Enter the target word: awiSayanA The closest match is: awiSayanZ 19:42 >perl 454_SoPW.pl Enter the target word: fallen The closest match is: falle 19:42 >

Note: You will need to adjust the values of CASE_SENSITIVE and MINIMUM_MATCH to suit your requirements.

Update 1 (1st January, 2013):

Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:

#! perl use strict; use warnings; use String::LCSS; use constant { CASE_SENSITIVE => 0, DICTIONARY_FILE => 'words.txt', }; print 'Enter the target word: '; chomp(my $orig_target = <STDIN>); my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target; open(my $in, '<', DICTIONARY_FILE) or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $! +"; my %substrings; while (my $orig_word = <$in>) { chomp $orig_word; my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word; my @lcss = lcss($word, $target); $substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l +css[0]; } close $in or die "Cannot close file '" . DICTIONARY_FILE . "': $!"; print 'Target: ', $orig_target, "\n"; if (%substrings) { my $key = (sort { length $a <=> length $b } keys %substrings)[- +1]; my $match = $substrings{ $key }->[0]; my $index2 = $substrings{ $key }->[2]; my $substr = substr($orig_target, $index2, length $key); print 'Closest match: ', $match, "\n"; print 'Longest common substring: ', $substr, "\n"; } else { print "No matches found\n"; } sub lcss { my ($first, $second) = @_; $first .= '$'; # force strings to be different: $second .= '@'; # kludge required by String::LCSS::lcss my @results = String::LCSS::lcss($first, $second); return wantarray ? @results : $results[0]; }

Update 2 (1st January, 2013):

It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:

18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc') +;" ab 18:28 >

(And see http://cpanratings.perl.org/dist/String-LCSS.)

Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:

sub lcss { my $strings = join "\0", @_; my $lcs; for my $n (1 .. length $strings) { my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1); last unless $strings =~ $re; $lcs = $1; } return $lcs; }

Update 3 (2nd January, 2013):

Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re^13: partial match between 2 files by Athanasius
in thread partial match between 2 files by lakssreedhar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.