Re^4: Possible faster way to do this?

Replies are listed 'Best First'.
Re^5: Possible faster way to do this? by Corion (Patriarch) on Jun 25, 2019 at 12:11 UTC
If you want to stay with a shell-based solution, you will have to stay with `cut`, but you can easily avoid `cut` by using either split (if your input data is well-formed enough) or Text::CSV_XS`->getline` to read tab-separated input. Personally, I wouldn't waste time (and RAM) on making the input data unique and instead just calculate the best input type directly for each input value. This will reduce the size of the data you need to remember far more than making the input data unique.	[reply] [d/l] [select]
Re^5: Possible faster way to do this? by bliako (Abbot) on Jun 25, 2019 at 13:28 UTC
i think the benefits of using Perl will be apparent later when you expand your pipeline. However, just for trying out ideas, there is also `awk` which does what `cut` does and more and also has hashmaps (associative arrays), so: Edit: N=1 specifies to use first column of input `awk -vN=1 '{if($N in uniq){uniq[$N]++}else{uniq[$N]++}}END{for(k in un +iq){print k," => ",uniq[k]}}'` [download]	[reply] [d/l] [select]