comment on

Hello R56,

Three points:

(1) Unless you’re sure that the first line of the data file begins with a single digit, followed by a single whitespace character, followed by a single digit, your regex will not work. You need to add quantifiers: /^(\d+)\s+(\d+)/.

I would write the opening code like so:

#! perl
use strict;
use warnings;
use autodie;

use constant
{
    IN_FILE   => 'text.txt',
    SIZE      => ???,
    MIN_MATCH => ???,
};

my $size      = SIZE;
my $min_match = MIN_MATCH;
my @array;

open(my $in, '<', IN_FILE);

if (defined(my $line = <$in>))
{
    if ($line =~ /^(\d+)\s+(\d+)/)
    {
        $size      = $1;
        $min_match = $2;
    }
    else
    {
        push @array, $line;
    }

    push @array, $_ while <$in>;
}

close $in;
[download]

Obviously, you will need to supply sensible default values for SIZE and MIN_MATCH.

(2) I do not know what this means:

The first line of the input contains two numbers. First one tells us how many strings we should consider, second one tells us the minimum number of original strings where the substring should appear.

However, I suggest you postpone consideration of these constraints until you’ve tackled the more important problem of finding the longest common substring for 3 or more strings.

(3) Your post implies that the general problem can be solved by “tweaking” the algorithm for finding the longest common substring between 2 strings. However, this may be harder than you think. Consider the following 3 strings:

adam***betty
^^charlie^^^^^adam^^betty^^^^^
#####adam######charlie##
[download]

By inspection, the longest substring common to this data set is adam. But application of your 2-string algorithm to the first 2 strings gives betty only. You need an algorithm which finds:
    the set S1 of all substrings common to strings 1 and 2,
    the set S2 of all substrings common to strings 1 and 3, and
    the set S3 of all substrings common to strings 2 and 3,
and then finds the solution as the longest string in the intersection of S1, S2, and S3.

A glance at Longest_common_substring_problem suggests that the algorithm you need is far from trivial. But the dynamic programming pseudocode given there should get you started in the right direction.

Hope that helps (a little),

Update: Improved the paragraph beginning “By inspection”.

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re: longest common substring (with needed tweaks) by Athanasius
in thread longest common substring (with needed tweaks) by R56

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.