in reply to longest common substring (with needed tweaks)

Hello R56,

Three points:

(1) Unless you’re sure that the first line of the data file begins with a single digit, followed by a single whitespace character, followed by a single digit, your regex will not work. You need to add quantifiers: /^(\d+)\s+(\d+)/.

I would write the opening code like so:

#! perl use strict; use warnings; use autodie; use constant { IN_FILE => 'text.txt', SIZE => ???, MIN_MATCH => ???, }; my $size = SIZE; my $min_match = MIN_MATCH; my @array; open(my $in, '<', IN_FILE); if (defined(my $line = <$in>)) { if ($line =~ /^(\d+)\s+(\d+)/) { $size = $1; $min_match = $2; } else { push @array, $line; } push @array, $_ while <$in>; } close $in;

Obviously, you will need to supply sensible default values for SIZE and MIN_MATCH.

(2) I do not know what this means:

The first line of the input contains two numbers. First one tells us how many strings we should consider, second one tells us the minimum number of original strings where the substring should appear.

However, I suggest you postpone consideration of these constraints until you’ve tackled the more important problem of finding the longest common substring for 3 or more strings.

(3) Your post implies that the general problem can be solved by “tweaking” the algorithm for finding the longest common substring between 2 strings. However, this may be harder than you think. Consider the following 3 strings:

adam***betty ^^charlie^^^^^adam^^betty^^^^^ #####adam######charlie##

By inspection, the longest substring common to this data set is adam. But application of your 2-string algorithm to the first 2 strings gives betty only. You need an algorithm which finds:
    the set S1 of all substrings common to strings 1 and 2,
    the set S2 of all substrings common to strings 1 and 3, and
    the set S3 of all substrings common to strings 2 and 3,
and then finds the solution as the longest string in the intersection of S1, S2, and S3.

A glance at Longest_common_substring_problem suggests that the algorithm you need is far from trivial. But the dynamic programming pseudocode given there should get you started in the right direction.

Hope that helps (a little),

Update: Improved the paragraph beginning “By inspection”.

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: longest common substring (with needed tweaks)
by R56 (Sexton) on Oct 27, 2013 at 14:56 UTC

    Hey Athanasius!

    1) Yes, it's exactly digit-space-digit, so my regex will work.

    2) I can see that's a bit hard to understand. Let me try again. First number tells us how many strings will be considered for comparison. Second number tells us the minimum number of strings where the substring has to appear. For instance:

    3 3 strrringggg ssttrrringggg stttrrringgg

    output will be 'rrr'

    3 2 strrringggg ssttrrringggg stttrrringgg

    output will be 'gggg'

    It is a weird filter, I know, but it needs to be that way.

    3) Yeah, I already read on the subject but can't seem to figure out how to make the changes. My initial idea was that some clever well-places loops would get me what I wanted, but know I'm starting to see it's not that simple.

    Thanks for the help though!