Here is a simple basic test for "is there an $n-digit repeated substring in $string?":

$string =~ /(.{$n}).*\1/;

That is: find an $n-digit substring, capture it as $1, then look forward in the string for another copy of $1. This is a reasonably efficient way to search for a repeated substring of a given length. However on a 9.2MB string that is still going to take a lot of work.

Note that this won't find an overlapping repeat, such as the repeated "121" in the string "3121213"; a more complex regexp could find such repeats, but would be much slower working on large strings.

One approach to finding the longest repeat would be to iterate $n upwards from 1:

sub longestrepeat { my($string) = @_; for (my $n = 1; 1; ++$n) { return $n - 1 unless $string =~ /(.{$n}).*\1/; } }

Other approaches to consider if that isn't fast enough are a) to do a binary chop (probably not useful in this case, since I think $n is likely to be smaller than log_2(digits)), or b) to start each subsequent search at the point the previous one succeeded (not too hard to code, but not likely to gain much).

You can extend the regexp to search for 3 copies of the same substring:

$string =~ /(.{$n}).*\1.*\1/;
.. or more, by including more copies of the .*\1 construct.

In general however I consider it extremely unlikely that you'll discover any patterns special to the digits of Mersenne primes expressed in decimal, beyond the features that any Mersenne number would have (ie the terminating digit patterns implicit in a number of the form 2n - 1). And of course doing the same thing in binary would be very boring. :)

Hugo


In reply to Re: Longest repeated string... by hv
in thread Longest repeated string... by Yzzyx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.