Re: delete redundant data

don't think i understand what exactly you wish to delete, but if understood you correctly this is what you wish to achieve :

convert this:

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A 229    ALA       A 115    ALA~?     
A 257    ALA       A 118    ALA~?     
A 328    ASP       A  95    ASP~?

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A  83    GLU       B  90    GLU^?     
A 163    ARG       B  83    ARG^?     
A 222    ARG       B  5     ARG^?
[download]

into this :


A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP
[download]

right ??

code :

#!/usr/bin/perl

use strict;

my (%hash, %hash_key);
my $x = 0;
while (<DATA>){
my @array = split(' ', $_);
$x++;
$hash{"$array[1]-$array[4]"} = $_;
$hash_key{$x} = "$array[1]-$array[4]";
}

foreach my $i (sort {$a <=> $b} keys %hash_key){
  (exists $hash{$hash_key{$i}}) ? (print "$hash{$hash_key{$i}}") : (pr
+int "deleted\n");
  delete($hash{$hash_key{$i}}) if (exists $hash{$hash_key{$i}});
}

__DATA__
A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP

A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG
[download]

baxy

UPDATE:

sorry i had to go as soon as i posted the reply (reason: girlfriend)

here is a more elegant solution. the first has some bugs and limitations due to me being in a hurry ;)

code :


#!/usr/bin/perl

use strict;

my (%hash, %hash_key); # hashes
my $x = 0;    # counters
while (<DATA>){        #starts reading the data line by line
my @array = split(' ', $_);  # split the data using spaces
$x++;                        # global counter
$hash{$array[1]}->{$array[4]} = $_;   # primary database
$hash_key{$x}= [$array[1],$array[4]]; # key database
}


foreach my $i (sort {$a <=> $b} keys %hash_key){

  (exists $hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) ? (print "$
+hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}") : (print "deleted\n"
+); # if the record in the database (hash) exists print it out otherwi
+se print 'deleted'
  next if ($hash_key{$i}->[0] eq ''); # you need the empty lines so if
+ you reached an empty line, skip the deleting part 
  delete($hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) if (exists $
+hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]} || $hash{$hash_key{$i}
+->[1]}->{$hash_key{$i}->[0]}); # if you printed the entry from the da
+tabase delete it , you don't need duplicates. this goes if your recor
+d has 80 90 situation or 90 80 situation
}

__DATA__
A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP

A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG
[download]

so what happenes... when you think about removing a duplicates think about hashes. so first hash is the actual database that withholds all he data and second one is the database that will preserve the order. once you hash your data all you have to do is print it in the order in which you saved it using the second hash_key. the deletion that follows is there so you don't print duplicates except if it is the blank space. you can remove the 'deleted' note if you don't need it.

baxy

also if you have any questions about the code , just shoot, example if you are not familiar with the :

($a ==1) ? (print "yes") : (print "no");
[download]

since you stated that you are new to Perl and all...

Comment on Re: delete redundant data Select or Download Code

Replies are listed 'Best First'.
Re^2: delete redundant data by Anonymous Monk on Aug 21, 2010 at 12:01 UTC
Yes. To be honest your code seems a little intimidating to me. If you have free time, would you care to explain a little? Of course I'll google all these too. Anyway, thanks a lot for your time.	[reply]