For your original question, I'd try something like
the following. (it is pretty rough, but its heart
is in the right place).
#!/usr/bin/perl -w
use strict;
my @packets=('abcdef','123456','abc123');
my $size;
for $size(2..4){
print "size=$size\n";
my %substrs;
my $packet;
foreach $packet(@packets){
my %data;
for(0..(length($packet)-$size)){
$data{substr($packet,$_,$size)}=1;
}
my $k;
foreach $k(keys %data){
if(defined $substrs{$k}){
$substrs{$k}++;
}else{
$substrs{$k}=1;
}
}
}
foreach((sort {$substrs{$b} <=> $substrs{$a}} keys %substrs)[0..5]){
print "$_ $substrs{$_}\n";
}
}
It isn't really efficient, but it will tell you which substrings of
a particular length are most common across packets. It will
tell you the most common substrings of a particular length.
Answering your actual question "most common, longest substrings"
is harder since you're trying to optomize 2 criteria at the same
time. Which is better, a 5 character string that happens 20 times
or a 20 character string tha happens 5 times?
However, in thinking about your problem in general I'd
do an analysis something like this:
- Do a statistical analysis of the raw data to determine
if it is encrypted, and if it is encrypted well. If it is
encrypted well, it will be statistically indistinguisable
from random noise. If it is encrypted poorly it will be
somewhat distinguishable from random noise. I used
to have a good reference to some algorithms for performaning
this kind of analysis but can't find them right now.
- Compare (by hand) the same transaction done several times
from several different hosts. Can you pick anything out.
- Since you said these are UDP packets, can you "replay" them
from a different host to cause the same event?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.