I'm documenting some more thoughts on this:
Once each column is analyzed, I can get an overall probability that a particular row is a header row by multiplying the probabilities that each column in isolation is a header. So: probabilty_col1_is_header * probability_col2_is_header * probability_col3_is_header, etc. When, or if, I get to a row that has a significantly lower overall probability than the previous rows, I can be pretty sure that that row starts the data and that the previous row or rows were headers.
$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks
In reply to Re^2: Useful heuristics for analyzing arrays of data to determine column header
by nysus
in thread Useful heuristics for analyzing arrays of data to determine column header
by nysus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |