Re^2: Useful heuristics for analyzing arrays of data to determine column header

I'm documenting some more thoughts on this:

Once each column is analyzed, I can get an overall probability that a particular row is a header row by multiplying the probabilities that each column in isolation is a header. So: probabilty_col1_is_header * probability_col2_is_header * probability_col3_is_header, etc. When, or if, I get to a row that has a significantly lower overall probability than the previous rows, I can be pretty sure that that row starts the data and that the previous row or rows were headers.

$PM = "Perl Monk's";
$MCF = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Comment on Re^2: Useful heuristics for analyzing arrays of data to determine column header Download Code