in reply to Re^3: Speed up file write taking weeks
in thread Speed up file write taking weeks
Let us move from here to the original problem
Two tables used as "input" to an SQL:
1. T_1_DAT ( t_1_id* , t_1_data_1 ) - 65 million rows - * = primary key
2. T_2_KEY ( t_2_id@ , t_2_data_2 ) - 72 million rows - @ = part of primary key and an index on this field alone
SQL
insert in table T_3_XYZ ( t_3_id , t_3_data_1 , t_3_data_2 ) select distinct t_1_id , t_1_data_1 , t_2_data_2 from T_1_DAT , T_2_KEY where t_1_id = t_2_id
Works for databases with multi column distinct facility. 1.7 trillion rows generated if the unique is removed. We are guessing 100 million if unique.
Now that we are doing it the flat file method, we are "joining" the two files, generating the 1.7 trillion output in instalments. After this we will sort it & "coalesce" it - in stages I guess!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Speed up file write taking weeks
by dave_the_m (Monsignor) on Jul 02, 2019 at 14:12 UTC | |
|
Re^5: Speed up file write taking weeks
by Corion (Patriarch) on Jul 02, 2019 at 13:40 UTC |