in reply to associations and sorting

I'd split off the large strings of tsv2 very early in the process, and substitute the line numbers. The large strings seem to be just slack, but your unstated final goal might be to compute links from tsv1.col1 to tsv2.col2. That additional lookup is the price you have to pay for reduced memory requirements.

Stating the file sizes would make your problem more comprehensible.