in reply to How to match duplicate lines in a text file and extract only one of those lines to a new file

Here is my solution:
#!/usr/bin/perl use strict; use warnings; my $input_file = 'HGDP.txt'; my $output_file = 'output.txt'; open my $fh, '<', $input_file or die "Unable to open for read $input_file: $!"; open my $out_fh, '>', $output_file or die "Unable to open for write $output_file: $!"; local $, = q{ }; local $\ = "\n"; my @rows; my $static_i = 3; # number of first unjoinable columns sub print_rows { print {$out_fh} @{$rows[0]}[0 .. $static_i - 1], map { my @columns; foreach my $x (0 .. $#rows) { push @columns, $rows[$x][$_]; } join q{/}, @columns; } $static_i .. $#{$rows[0]}; } while (defined(my $line_1 = <$fh>)) { my ($x) = $line_1 =~ /^(\d+)/ or next; push @rows, [split q{ }, $line_1]; while (defined(my $line_2 = <$fh>)) { next unless $line_2 =~ /^\d/; if ($line_2 =~ /^$x\b/) { push @rows, [split q{ }, $line_2]; } else { print_rows(); @rows = [split q{ }, $line_2]; last; } } } print_rows(); close $fh; close $out_fh; __END__|Output from your example: 1 51 Brahui A/A C/C A/A A/G T/T 3 51 Brahui A/A C/C A/G G/A C/T 5 51 Brahui A/A C/C G/G A/G T/C 7 51 Brahui A/A C/C G/G A/G T/T 9 51 Brahui A/A C/C G/G G/G T/T
Note that the code it's not very efficient and is not very well written, but you can try to improve it. Good luck!
  • Comment on Re: How to match duplicate lines in a text file and extract only one of those lines to a new file
  • Download Code

Replies are listed 'Best First'.
Re^2: How to match duplicate lines in a text file and extract only one of those lines to a new file
by danica (Initiate) on Apr 05, 2012 at 09:27 UTC
    Thank you very much! Though you said it needs improving, I felt like I understood your code!

      Hi Guys, I am new to perl, I have a situation which is very similar to this, where my input rows are given below and I have to find the duplicates on the first column

      green apple green grapes blue blueberries orange pappaya orange orange
      Output: green apple/grapes blue blueberries orange pappaya/orange

      can one of you guys please explain this code... Thanks

        What have you tried so far? Please post it here, inside <code></code> tags.

        The way forward always starts with a minimal test.