While I wrote SQL::Type::Guess, it really, really wants to have all the data in memory already, which is asking a bit much for a 5 TB file.
If you are content with a quick approach, consider either taking the first 100 GB or a random sampling of rows from your input file(s) and use these with SQL::Type::Guess to determine good types for your columns.
Alternatively, a more "manual" approach of reading the file line by line and feeding the information to SQL::Type::Guess could also work:
while (my $hashref = read_next_row_from_file_as_hashref( $fh )) { $sqltypes->guess( $hashref ); }
Have you looked at how long it takes to read the 5TB file without doing any processing? Maybe two days isn't all that bad.
In reply to Re: Possible faster way to do this?
by Corion
in thread Possible faster way to do this?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |