in reply to Possible faster way to do this?
While I wrote SQL::Type::Guess, it really, really wants to have all the data in memory already, which is asking a bit much for a 5 TB file.
If you are content with a quick approach, consider either taking the first 100 GB or a random sampling of rows from your input file(s) and use these with SQL::Type::Guess to determine good types for your columns.
Alternatively, a more "manual" approach of reading the file line by line and feeding the information to SQL::Type::Guess could also work:
while (my $hashref = read_next_row_from_file_as_hashref( $fh )) { $sqltypes->guess( $hashref ); }
Have you looked at how long it takes to read the 5TB file without doing any processing? Maybe two days isn't all that bad.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Possible faster way to do this?
by Eily (Monsignor) on Jun 25, 2019 at 09:44 UTC |