in reply to longest common substring (with needed tweaks)
Hello R56,
Three points:
(1) Unless you’re sure that the first line of the data file begins with a single digit, followed by a single whitespace character, followed by a single digit, your regex will not work. You need to add quantifiers: /^(\d+)\s+(\d+)/.
I would write the opening code like so:
#! perl use strict; use warnings; use autodie; use constant { IN_FILE => 'text.txt', SIZE => ???, MIN_MATCH => ???, }; my $size = SIZE; my $min_match = MIN_MATCH; my @array; open(my $in, '<', IN_FILE); if (defined(my $line = <$in>)) { if ($line =~ /^(\d+)\s+(\d+)/) { $size = $1; $min_match = $2; } else { push @array, $line; } push @array, $_ while <$in>; } close $in;
Obviously, you will need to supply sensible default values for SIZE and MIN_MATCH.
(2) I do not know what this means:
The first line of the input contains two numbers. First one tells us how many strings we should consider, second one tells us the minimum number of original strings where the substring should appear.
However, I suggest you postpone consideration of these constraints until you’ve tackled the more important problem of finding the longest common substring for 3 or more strings.
(3) Your post implies that the general problem can be solved by “tweaking” the algorithm for finding the longest common substring between 2 strings. However, this may be harder than you think. Consider the following 3 strings:
adam***betty ^^charlie^^^^^adam^^betty^^^^^ #####adam######charlie##
By inspection, the longest substring common to this data set is adam. But application of your 2-string algorithm to the first 2 strings gives betty only. You need an algorithm which finds:
the set S1 of all substrings common to strings 1 and 2,
the set S2 of all substrings common to strings 1 and 3, and
the set S3 of all substrings common to strings 2 and 3,
and then finds the solution as the longest string in the intersection of S1, S2, and S3.
A glance at Longest_common_substring_problem suggests that the algorithm you need is far from trivial. But the dynamic programming pseudocode given there should get you started in the right direction.
Hope that helps (a little),
Update: Improved the paragraph beginning “By inspection”.
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: longest common substring (with needed tweaks)
by R56 (Sexton) on Oct 27, 2013 at 14:56 UTC |