Re: delete redundant data

Another way to do it would be to read the data in paragraph mode so that each readline gets a record rather than a line. You could then extract all the numeric data from the record using a global regex, join the numbers with another character (to avoid false positives with 9 and 87 versus 98 and 7) then use the string as a hash key to eliminate duplicates.

use strict;
use warnings;

my %seen = ();
{
    local $/ = q{};     # Paragraph mode
    while ( <DATA> )
    {
        my $key = join q{:}, m{(\d+)}g;
        print unless $seen{ $key } ++;
    }
}

__DATA__
A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A 229    ALA       A 115    ALA~?     
A 257    ALA       A 118    ALA~?     
A 328    ASP       A  95    ASP~?

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A  83    GLU       B  90    GLU^?     
A 163    ARG       B  83    ARG^?     
A 222    ARG       B  5     ARG^?
[download]

The output.

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A 229    ALA       A 115    ALA~?     
A 257    ALA       A 118    ALA~?     
A 328    ASP       A  95    ASP~?
[download]

I hope this is helpful.

Cheers,

JohnGG

Comment on Re: delete redundant data Select or Download Code

Replies are listed 'Best First'.
Re^2: delete redundant data by nurulnad (Acolyte) on Aug 22, 2010 at 04:49 UTC
Thank you. I already read my data as paragraph by putting `$/ = " ";` [download] which is sort of dumb but it works. thanks for pointing out the regex and presenting a different way than roboticus to use hash.	[reply] [d/l]