in reply to select strings with biggest length
<proteins.txt sed 'N;s/\n/ /' | perl -wpe 'print length, " ";' | sort +-nr | sort -suk2,2 | cut -d\ -f2- | sed 's/ /\n/' >output.txt
This assumes that the protein names don't contain whitespace, that the order of the output doesn't matter, that it doesn't matter which one of two sequences of equal lengths you choose, and it might also need gnu sed. Any of these could be fixed easily.
|
|---|