Re: merge lines removing duplicates in a file

In general:

c:\@Work\Perl\monks>perl -wMstrict -le
"use List::MoreUtils qw(uniq);
 ;;
 my @data = qw(
   NC_009565:0  NC_017524:0  NC_017522:0  NC_018143:0
   NC_017026:0  NC_017523:0  NC_016934:1  NC_018078:0
   NC_017026:0  NC_017523:0  NC_016934:1  NC_018078:0
   NC_999999:0  NC_999999:1
   NC_021193:0  NC_016768:0  NC_021251:0  NC_021192:0
   NC_012943:0  NC_002755:0  NC_020559:0  NC_020089:0
   NC_999999:1  NC_999999:0
   );
 ;;
 my @uniq = uniq @data;
 ;;
 printf qq{%d in \@data \n}, scalar @data;
 printf qq{%d in \@uniq \n}, scalar @uniq;
 print qq{'$_'} for @uniq;
"
24 in @data
18 in @uniq
'NC_009565:0'
'NC_017524:0'
'NC_017522:0'
'NC_018143:0'
'NC_017026:0'
'NC_017523:0'
'NC_016934:1'
'NC_018078:0'
'NC_999999:0'
'NC_999999:1'
'NC_021193:0'
'NC_016768:0'
'NC_021251:0'
'NC_021192:0'
'NC_012943:0'
'NC_002755:0'
'NC_020559:0'
'NC_020089:0'
[download]

I leave to you the task of extracting the protein (?) and NC_ info from all records and associating the latter (after uniq-ification) with the former. See List::MoreUtils::uniq().

Give a man a fish: <%-{-{-{-<

Comment on Re: merge lines removing duplicates in a file Select or Download Code