Re^5: Possible faster way to do this?

If you want to stay with a shell-based solution, you will have to stay with cut, but you can easily avoid cut by using either split (if your input data is well-formed enough) or Text::CSV_XS->getline to read tab-separated input.

Personally, I wouldn't waste time (and RAM) on making the input data unique and instead just calculate the best input type directly for each input value. This will reduce the size of the data you need to remember far more than making the input data unique.

Comment on Re^5: Possible faster way to do this? Select or Download Code