in reply to delete redundant data

open (my $INPUT, '<', 'bodo.txt'); open (my $OUTPUT, '>', 'output.txt'); my %seen; while (<$INPUT>) { my $line = $_; $line =~ /\A\D+(\d+)\D+(\d+)/; print $OUTPUT $line unless (defined($seen{"$1\t$2"}) and $line ne +"\n"); $seen{"$1\t$2"}++; }
Not really clever, but pretty clean and hopefully easy to follow. I used $1\t\$2 as the key for %seen as I didn't think you wanted to Blah 83 Blah 90 to prevent Blah 8 Blah 390 from printing. Also wasn't sure if you wanted to keep the blank lines in there or not but thought I'd keep them. Could optimize the regex but prob would need better information on the dataset etc...
Update: Also wanted to point out that you are assuming that the line of data will be unique against the numerical values in it. You could check store the line per unique value sets and then check for variations to see if it's been added. If that is how you need to go let us know cause we'll have to tweak our solutions for you.