Unless I misunderstood the question, this may be NP complete, but it doesn't take long. Leastwise not if all your numbers are small integers, < 255.

What I did was to encode the arrays of integers into strings, and then build an index to all the N-char length substrings in the 1000 x 1000-char strings. This latter process only takes a few seconds.

Less time infact than generating and encoding the test data. Reading it from a file and encoding it directly to strings would make it quicker still, by saving on the ram for all the arrays.

It takes around 27 seconds (total: Generation, encoding, indexing, counting and printing to nul!) to process 1000 x 1000-number sets on my mid-range machine.

#! perl -slw use strict; use Data::Dumper; $| = 1; $" = ', '; our $LEN ||= 1000; our $COUNT ||= 1000; our $M ||= 2; our $N ||= 2; my @data = map { [ map{ int rand 256 } 1 .. $LEN ] } 1 .. $COUNT; warn "Gen'd data"; my @encoded = map { pack 'C*', @$_; } @data; warn 'Encoded data'; my %seq; for my $i ( 0 .. $#encoded ) { for my $o ( 0 .. $LEN - $N ) { push @{ $seq{ substr $encoded[ $i ], $o, $N } }, $i; } } my $count = 0; for my $run ( keys %seq ) { next unless @{ $seq{ $run } } > $M; $count++; printf "Run [%*s] appears in lines: %s\n", $N * 4, join(',', map{ sprintf '%03d', $_ } unpack 'C*', $run ), "[@{ $seq{ $run } }]"; } warn "Found $count $N-number runs appearing in at least $M sets"; __END__ [11:54:20.15] P:\test>412061 -LEN=1000 -COUNT=1000 -N=2 -M=2 >nul Gen'd data at P:\test\412061.pl line 16. Encoded data at P:\test\412061.pl line 21. Found 65533 2-number runs appearing in at least 2 sets at P:\test\4120 +61.pl line 41. [11:54:47.06] P:\test>412061 -LEN=1000 -COUNT=1000 -N=3 -M=2 >nul Gen'd data at P:\test\412061.pl line 16. Encoded data at P:\test\412061.pl line 21. Found 553 3-number runs appearing in at least 2 sets at P:\test\412061 +.pl line 41. [11:55:16.46] P:\test> [11:55:23.76] P:\test>412061 -LEN=1000 -COUNT=1000 -N=3 -M=3 >nul Gen'd data at P:\test\412061.pl line 16. Encoded data at P:\test\412061.pl line 21. Found 9 3-number runs appearing in at least 3 sets at P:\test\412061.p +l line 41. [11:55:55.84] P:\test> [11:55:55.84] P:\test>412061 -LEN=1000 -COUNT=1000 -N=3 -M=3 Gen'd data at P:\test\412061.pl line 16. Encoded data at P:\test\412061.pl line 21. Run [ 215,017,145] appears in lines: [546, 657, 671, 986] Run [ 205,013,147] appears in lines: [326, 360, 385, 975] Run [ 134,174,216] appears in lines: [335, 512, 539, 592] Run [ 114,015,201] appears in lines: [470, 544, 557, 989] Run [ 065,233,176] appears in lines: [533, 656, 789, 930] Run [ 003,013,171] appears in lines: [17, 169, 460, 979] Run [ 191,241,173] appears in lines: [25, 88, 442, 782] Run [ 245,165,229] appears in lines: [373, 405, 599, 697] Run [ 072,156,003] appears in lines: [86, 153, 464, 544] Run [ 055,130,150] appears in lines: [73, 127, 157, 384, 868] Run [ 044,035,182] appears in lines: [3, 402, 734, 764] Run [ 187,058,006] appears in lines: [171, 487, 924, 944] Run [ 060,153,213] appears in lines: [72, 404, 708, 976] Run [ 057,200,093] appears in lines: [221, 246, 514, 908] Found 14 3-number runs appearing in at least 3 sets at P:\test\412061. +pl line 41. [12:04:02.68] P:\test>

Examine what is said, not who speaks.
"But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
"Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re: Seeking algorithm for finding common continous sub-patterns by BrowserUk
in thread Seeking algorithm for finding common continous sub-patterns by johnnywang

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.