in reply to Most common substring

a Huffman coding algorithm should find the most common substrings. That is not the ultimate goal of Huffman coding, but it is an intermediate result, so that part of the algorithm could be used.

Check out Algorithm::Huffman at CPAN.
Check out a book on data compression
Also, there is a good explanation of Huffman coding in general, in the digest of Perl Journal articles Computer Science & Perl Programming(the article is probably available by itself from the perl journal)

Update:My suggestion won't really work. Optimal Huffman coding will certainly find the most common substrings, but not necessarily the longest, or necessarily of a certain size.
Sorry, i'm not sure if some twist on a Huffman algorithm will work, but some of the other posts end up doing effectively what i envision a hacked Huffman algorithm would do.
:(