Re: Most common substring

Actually, your third algorithm is not likely to produce the results you want. Consider the following code snippet:

#!/usr/bin/perl
 
my $string = "123123124";
my $len = 5;
 
my %substrings;
for (my $i = 0; $i + $len <= length $string; $i++)
{
        my $sub = substr($string, $i, $len);
        $substrings{$sub}++;
}
 
print "$_ => $substrings{$_}\n" for sort {
        $substrings{$b} <=> $substrings{$a}
                ||
        $a cmp $b
} keys %substrings;
[download]

If you were to mask out 12312 right after finding it, you would remove the possibility of finding 23123. That's not so good.

By the way, the above algorithm naively implements a method suggested by the first responder. I think that a possible savings (trading a lot of time for RAM) would be to use a file for the hash storage. But, I would recommend trying it out first before attempting to optimize it.

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Comment on Re: Most common substring Download Code