For your original question, I'd try something like the following. (it is pretty rough, but its heart is in the right place).
#!/usr/bin/perl -w use strict; my @packets=('abcdef','123456','abc123'); my $size; for $size(2..4){ print "size=$size\n"; my %substrs; my $packet; foreach $packet(@packets){ my %data; for(0..(length($packet)-$size)){ $data{substr($packet,$_,$size)}=1; } my $k; foreach $k(keys %data){ if(defined $substrs{$k}){ $substrs{$k}++; }else{ $substrs{$k}=1; } } } foreach((sort {$substrs{$b} <=> $substrs{$a}} keys %substrs)[0..5]){ print "$_ $substrs{$_}\n"; } }
It isn't really efficient, but it will tell you which substrings of a particular length are most common across packets. It will tell you the most common substrings of a particular length. Answering your actual question "most common, longest substrings" is harder since you're trying to optomize 2 criteria at the same time. Which is better, a 5 character string that happens 20 times or a 20 character string tha happens 5 times?

However, in thinking about your problem in general I'd do an analysis something like this:

  1. Do a statistical analysis of the raw data to determine if it is encrypted, and if it is encrypted well. If it is encrypted well, it will be statistically indistinguisable from random noise. If it is encrypted poorly it will be somewhat distinguishable from random noise. I used to have a good reference to some algorithms for performaning this kind of analysis but can't find them right now.
  2. Compare (by hand) the same transaction done several times from several different hosts. Can you pick anything out.
  3. Since you said these are UDP packets, can you "replay" them from a different host to cause the same event?

In reply to Re: Finding patterns in packet data? by lhoward
in thread Finding patterns in packet data? by Guildenstern

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.