in reply to Search for identical substrings

Are there usefull constraints on the sub-string match such as a minimum interesting length, or a match may only start at every nth character or a constraint on the maximum match length?

Note that it would help for people copying the data from your scratch pad if the data were between <code> </code> tags.

Update: I guess the comment in the OP code means that 200 characters is the minimum usefull match?

Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re^2: Search for identical substrings
by bioMan (Beadle) on Aug 22, 2005 at 17:27 UTC

    I have put <code> tages around the data on my scratchpad.

    Yes, a match below 200 charachers doesn't have much meaning. Strings below 200 charachters can occur by random chance with too high a frequency.

    I would also like to know how many different identical substrings greater than 200 characters exist between pairs of the 3k strings. I hope this statement makes sense.