There's a lot of ambiguity here, so I would second the notion of just throwing it all into a database to analyze. I imagine even indexing will take patience, so maybe make a table per column to make the initial data type determinations standalone. Then dump it into the final db spec when decisions are made. Normalizing it may dictate having multiple tables at the end of the day afterall, and the dump/restore might benefit from having the columns separated at that point as well.