in reply to merge lines removing duplicates in a file
In general:
I leave to you the task of extracting the protein (?) and NC_ info from all records and associating the latter (after uniq-ification) with the former. See List::MoreUtils::uniq().c:\@Work\Perl\monks>perl -wMstrict -le "use List::MoreUtils qw(uniq); ;; my @data = qw( NC_009565:0 NC_017524:0 NC_017522:0 NC_018143:0 NC_017026:0 NC_017523:0 NC_016934:1 NC_018078:0 NC_017026:0 NC_017523:0 NC_016934:1 NC_018078:0 NC_999999:0 NC_999999:1 NC_021193:0 NC_016768:0 NC_021251:0 NC_021192:0 NC_012943:0 NC_002755:0 NC_020559:0 NC_020089:0 NC_999999:1 NC_999999:0 ); ;; my @uniq = uniq @data; ;; printf qq{%d in \@data \n}, scalar @data; printf qq{%d in \@uniq \n}, scalar @uniq; print qq{'$_'} for @uniq; " 24 in @data 18 in @uniq 'NC_009565:0' 'NC_017524:0' 'NC_017522:0' 'NC_018143:0' 'NC_017026:0' 'NC_017523:0' 'NC_016934:1' 'NC_018078:0' 'NC_999999:0' 'NC_999999:1' 'NC_021193:0' 'NC_016768:0' 'NC_021251:0' 'NC_021192:0' 'NC_012943:0' 'NC_002755:0' 'NC_020559:0' 'NC_020089:0'
Give a man a fish: <%-{-{-{-<
|
|---|