Re^4: How can I keep the first occurrence from duplicated strings?

For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database

IME you are seriously underestimating the time it would take to perform those insertions. I would not take this approach but rather would use the all-in-Perl approach as proposed by other respondents. It will be more robust, quicker to develop and faster to run to completion.

There are of course different tasks where the time penalty of loading into a database will be outweighed by other advantages but a single pass through the data while discarding a majority of rows like this isn't one of them.

🦛

Comment on Re^4: How can I keep the first occurrence from duplicated strings?