in reply to Removing partially duplicated lines from a file
UPDATE: If you want to maintain the headers, try this:>perl -ane "next if $seen{$F[1].$F[2]}++;print" InputFile.txt ---------------------------------------------------------------------- +------------- Pos HLA Peptide Core Of Gp Gl Ip Il Ic +ore Identity Score Aff(nM) %Rank BindLevel 117 HLA-A*11:01 YVNVNMGLK YVNVNMGLK 0 0 0 0 0 YVNVNM +GLK GQ924620_HBe_C_ 0.62268 59.3 0.40 <= SB 28 HLA-A*11:01 WGMDIDPYK WGMDIDPYK 0 0 0 0 0 WGMDID +PYK GQ924620_HBe_C_ 0.44617 400.4 1.60 <= WB 133 HLA-A*11:01 HISCLTFGR HISCLTFGR 0 0 0 0 0 HISCLT +FGR GQ924620_HBe_C_ 0.43660 444.0 1.70 <= WB 47 HLA-A*02:05 YVNVNMGLK FLPSDFFPS 0 0 0 0 0 FLPSDF +FPS X02763_HBe_A_po 0.77090 11.9 0.08 <= SB 40 HLA-A*02:05 ATVELLSFL ATVELLSFL 0 0 0 0 0 ATVELL +SFL X02763_HBe_A_po 0.75279 14.5 0.10 <= SB 1 HLA-A*02:05 MQLFHLCLI MQLFHLCLI 0 0 0 0 0 MQLFHL +CLI X02763_HBe_A_po 0.66669 36.8 0.30 <= SB 9 HLA-A*02:05 IISCTCPTV IISCTCPTV 0 0 0 0 0 IISCTC +PTV X02763_HBe_A_po 0.52206 176.1 1.40 <= WB 147 HLA-A*02:05 YLVSFGVWI YLVSFGVWI 0 0 0 0 0 YLVSFG +VWI X02763_HBe_A_po 0.51724 185.5 1.40 <= WB 55 HLA-A*02:05 SVRDLLDTA SVRDLLDTA 0 0 0 0 0 SVRDLL +DTA X02763_HBe_A_po 0.49966 224.4 1.70 <= WB 114 HLA-A*02:05 VVNYVNTNV VVNYVNTNV 0 0 0 0 0 VVNYVN +TNV X02763_HBe_A_po 0.48729 256.6 1.80 <= WB 93 HLA-A*02:05 ELMTLATWV ELMTLATWV 0 0 0 0 0 ELMTLA +TWV X02763_HBe_A_po 0.46686 320.0 2.50 8 HLA-A*02:05 LIISCTCPT LIISCTCPT 0 0 0 0 0 LIISCT +CPT X02763_HBe_A_po 0.45053 381.9 2.50 117 HLA-A*11:01 IISCTCPTV YVNVNMGLK 0 0 0 0 0 YVNVNM +GLK AB219428_HBe_B_ 0.62268 59.3 0.40 <= SB
perl -ane "next unless not $seen{$F[1].$F[2]}++ or m/^\s*\D/;print" I +nputFile.txt
"Software interprets lawyers as damage, and routes around them" - Larry Wall
|
|---|