in reply to How to match duplicate lines in a text file and extract only one of those lines to a new file

This does the job:

my %data; while (<DATA>) { chomp; my ($firstnum, $secondnum, $thingy, @bits) = split /\s/; my $key = sprintf("%s\x00%s\x00%s", $firstnum, $secondnum, $thingy +); for my $i (0 .. $#bits) { $data{$key}[$i] = [] unless exists $data{$key}[$i]; push @{ $data{$key}[$i] }, $bits[$i]; } } foreach my $key (sort keys %data) { print join q[ ], split "\x00", $key; print q[ ]; print join q[ ], map { join '/', @$_ } @{ $data{$key} }; print "\n"; } __DATA__ 1 51 Brahui A C A A T 1 51 Brahui A C A G T 3 51 Brahui A C A G C 3 51 Brahui A C G A T 5 51 Brahui A C G A T 5 51 Brahui A C G G C 7 51 Brahui A C G A T 7 51 Brahui A C G G T 9 51 Brahui A C G G T 9 51 Brahui A C G G T

But don't just copy that as-is. Try to understand how it works. What you want to look at is:

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
  • Comment on Re: How to match duplicate lines in a text file and extract only one of those lines to a new file
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: How to match duplicate lines in a text file and extract only one of those lines to a new file
by danica (Initiate) on Apr 04, 2012 at 13:26 UTC
    Hiya, Thank you so much for your help, I tried to run your code just to see how it works. One thing I noticed when I look at the output is that the first column doesn't seem to get transformed. Some duplicates also seem to have been missed.

    Like so:

    1 Brahui A C/C A/A A/G T/T

    100 Hazara A C G A T C C

    100 Hazara G C A A T C T

    102 Hazara A C/C G/G A/G

      In your original sample data, every line began with two integers and then a text string. Now you seem to be running it on lines that begin with a single integer and a text string, so his code is picking up the first allele as part of the duplicated section.

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.

        Oh yes of course! Thank you for pointing out such an obvious mistake!