comment on

OK, so it appears that by “maximum partial match” you mean longest common substring. A search on that phrase found the thread finding longest common substring, from which I derived the following:

#! perl
use strict;
use warnings;
use Algorithm::Diff qw( LCS );
use constant {
                 CASE_SENSITIVE  => 0,
                 DICTIONARY_FILE => 'words.txt',
                 MINIMUM_MATCH   => 2,
             };

print "Enter the target word: ";
chomp(my $target = <STDIN>);
my @target = split //, CASE_SENSITIVE ? $target : lc $target;

open(my $in, '<', DICTIONARY_FILE)
    or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $!
+";
my %substrings;

while (my $word = <$in>)
{
    chomp $word;
    my @word = split //, CASE_SENSITIVE ? $word : lc $word;
    my @lcs  = LCS(\@target, \@word);
    $substrings{ join('', @lcs) } = $word if @lcs >= MINIMUM_MATCH;
}

close $in
    or die "Cannot close file '" . DICTIONARY_FILE . "': $!";

if (%substrings)
{
    my @matches = sort { length $a <=> length $b } keys %substrings;
    print "The closest match is:  ", $substrings{ $matches[-1] }, "\n"
+;
}
else
{
    print "No matches found\n";
}
[download]

With “words.txt” containing:

fal
falle
fall
awiSayaM
awiSayanZ
[download]

and suitable input from the keyboard, the output is:

19:42 >perl 454_SoPW.pl
Enter the target word: awiSayanA
The closest match is:  awiSayanZ

19:42 >perl 454_SoPW.pl
Enter the target word: fallen
The closest match is:  falle

19:42 >
[download]

Note: You will need to adjust the values of CASE_SENSITIVE and MINIMUM_MATCH to suit your requirements.

Update 1 (1^st January, 2013):

Algorithm::Diff is actually the wrong module for this, I should have used String::LCSS. The former finds non-contiguous sub-sequences; the latter finds substrings. Revised code:

#! perl
use strict;
use warnings;
use String::LCSS;

use constant {
                 CASE_SENSITIVE  =>  0,
                 DICTIONARY_FILE => 'words.txt',
             };

print 'Enter the target word:    ';
chomp(my $orig_target = <STDIN>);
my $target = CASE_SENSITIVE ? $orig_target : lc $orig_target;

open(my $in, '<', DICTIONARY_FILE)
    or die "Cannot open file '" . DICTIONARY_FILE . "' for reading: $!
+";
my %substrings;

while (my $orig_word = <$in>)
{
    chomp $orig_word;
    my $word = CASE_SENSITIVE ? $orig_word : lc $orig_word;
    my @lcss = lcss($word, $target);
    $substrings{ $lcss[0] } = [ $orig_word, $lcss[1], $lcss[2] ] if $l
+css[0];
}

close $in
    or die "Cannot close file '" . DICTIONARY_FILE . "': $!";

print 'Target:                   ', $orig_target, "\n";

if (%substrings)
{
    my $key    = (sort { length $a <=> length $b } keys %substrings)[-
+1];
    my $match  =  $substrings{ $key }->[0];
    my $index2 =  $substrings{ $key }->[2];
    my $substr =  substr($orig_target, $index2, length $key);

    print 'Closest match:            ', $match,  "\n";
    print 'Longest common substring: ', $substr, "\n";
}
else
{
    print "No matches found\n";
}

sub lcss
{
    my ($first, $second) = @_;

    $first  .= '$';     # force strings to be different:
    $second .= '@';     #   kludge required by String::LCSS::lcss

    my @results = String::LCSS::lcss($first, $second);

    return wantarray ? @results : $results[0];
}
[download]

Update 2 (1^st January, 2013):

It appears that String::LCSS is more badly broken than I realised. Even simple matches can fail to find the longest common substring:

18:27 >perl -MString::LCSS=lcss -wE "say scalar lcss('abxabcy', 'abc')
+;"
ab

18:28 >
[download]

(And see http://cpanratings.perl.org/dist/String-LCSS.)

Better to replace sub lcss in the above script with the following by BrowserUk in Re: finding longest common substring:

sub lcss
{
    my $strings = join "\0", @_;
    my $lcs;

    for my $n (1 .. length $strings)
    {
        my $re = "(.{$n})" . '.*\0.*\1' x (@_ - 1);
        last unless $strings =~ $re;
        $lcs = $1;
    }

    return $lcs;
}
[download]

Update 3 (2^nd January, 2013):

Discovered the thread Does String::LCSS work?. String::LCSS is indeed broken, but String::LCSS_XS seems to work correctly.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re^13: partial match between 2 files by Athanasius
in thread partial match between 2 files by lakssreedhar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.