Generating trillions of records from initial data containing millions of records, then deduplicating that back to millions of records sounds like a really bad idea. If you gave us a better idea of what data you start with (how many rows, columns, how many unique keys, average size of a record etc), and what the final result is that you want, we might be able to come up with some suggestions.
Dave.