This could be done in one pass through the string.

Conceptually it would work like so.

use strict; my @nums=split('', $number); # the number to work on my(@most_common, %once, %common); my $mcc=2; # if there are no common substrings don't store while( @nums > 4) { my $key = join('', @nums[0..4]); $common{$key}++; if($common{$key} > $mcc) { $mcc=$common{$key}; @most_common=($key); #new max set entire array to $key } elsif($common{$key} == $mcc) { push(@most_common, $key); # tack $key onto largest } if($common{$key} == 1) { $once{$key}=1; } elsif($once{$key}) { delete $once{$key}; } shift @nums; #slide down 1 digit } print "Most ($mcc): ", join(', ', @most_common), "\n"; print "Once: ", join(', ', (keys %once)), "\n";

I have added a bit of code to track the most common values and save them in an array. I would expect that least common will most likely be the set of substrings only used once. Unless you define common to be lowest but more than once. Then much more bookkeeping code would be needed for the low end. On the high end if the current case is larger than the previous largest we simply forget about the list and start a new list with the current substring as it's only member. I also started $mcc at 2 to avoid a lot of needless bookkeeping for substrings seen only once.

I ran the code on sample data of 2^9999 and got

Most (2): 96655, 84403, 66114, 11748, 17484, 40380, 74169, 41696, 41844, 47194, 71162, 92065, 54736, 28703, 84689, 22165, 92292, 47369, 41891, 87379, 37954, 04224, 42244, 08257, 35778, 23461, 29741, 19795, 79549, 78117, 56688, 58090, 43252, 32528, 42018, 98726, 03714, 41492, 24440, 01363, 40657, 90170, 41347, 48935, 89357

Once: every other substring which is a long list.


In reply to Re: Most common substring by dga
in thread Most common substring by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.