in reply to delete redundant data

don't think i understand what exactly you wish to delete, but if understood you correctly this is what you wish to achieve :

convert this:

A 83 GLU A 90 GLU^? A 163 ARG A 83 ARG^? A 222 ARG A 5 ARG^? A 229 ALA A 115 ALA~? A 257 ALA A 118 ALA~? A 328 ASP A 95 ASP~? A 83 GLU A 90 GLU^? A 163 ARG A 83 ARG^? A 222 ARG A 5 ARG^? A 83 GLU B 90 GLU^? A 163 ARG B 83 ARG^? A 222 ARG B 5 ARG^?
into this :
A 83 GLU B 90 GLU A 163 ARG B 83 ARG A 222 ARG B 5 ARG A 229 ALA A 115 ALA A 257 ALA A 118 ALA A 328 ASP A 95 ASP
right ??

code :

#!/usr/bin/perl use strict; my (%hash, %hash_key); my $x = 0; while (<DATA>){ my @array = split(' ', $_); $x++; $hash{"$array[1]-$array[4]"} = $_; $hash_key{$x} = "$array[1]-$array[4]"; } foreach my $i (sort {$a <=> $b} keys %hash_key){ (exists $hash{$hash_key{$i}}) ? (print "$hash{$hash_key{$i}}") : (pr +int "deleted\n"); delete($hash{$hash_key{$i}}) if (exists $hash{$hash_key{$i}}); } __DATA__ A 83 GLU A 90 GLU A 163 ARG A 83 ARG A 222 ARG A 5 ARG A 229 ALA A 115 ALA A 257 ALA A 118 ALA A 328 ASP A 95 ASP A 83 GLU A 90 GLU A 163 ARG A 83 ARG A 222 ARG A 5 ARG A 83 GLU B 90 GLU A 163 ARG B 83 ARG A 222 ARG B 5 ARG
baxy

UPDATE:

sorry i had to go as soon as i posted the reply (reason: girlfriend)

here is a more elegant solution. the first has some bugs and limitations due to me being in a hurry ;)

code :

#!/usr/bin/perl use strict; my (%hash, %hash_key); # hashes my $x = 0; # counters while (<DATA>){ #starts reading the data line by line my @array = split(' ', $_); # split the data using spaces $x++; # global counter $hash{$array[1]}->{$array[4]} = $_; # primary database $hash_key{$x}= [$array[1],$array[4]]; # key database } foreach my $i (sort {$a <=> $b} keys %hash_key){ (exists $hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) ? (print "$ +hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}") : (print "deleted\n" +); # if the record in the database (hash) exists print it out otherwi +se print 'deleted' next if ($hash_key{$i}->[0] eq ''); # you need the empty lines so if + you reached an empty line, skip the deleting part delete($hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) if (exists $ +hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]} || $hash{$hash_key{$i} +->[1]}->{$hash_key{$i}->[0]}); # if you printed the entry from the da +tabase delete it , you don't need duplicates. this goes if your recor +d has 80 90 situation or 90 80 situation } __DATA__ A 83 GLU A 90 GLU A 163 ARG A 83 ARG A 222 ARG A 5 ARG A 229 ALA A 115 ALA A 257 ALA A 118 ALA A 328 ASP A 95 ASP A 83 GLU A 90 GLU A 163 ARG A 83 ARG A 222 ARG A 5 ARG A 83 GLU B 90 GLU A 163 ARG B 83 ARG A 222 ARG B 5 ARG
so what happenes... when you think about removing a duplicates think about hashes. so first hash is the actual database that withholds all he data and second one is the database that will preserve the order. once you hash your data all you have to do is print it in the order in which you saved it using the second hash_key. the deletion that follows is there so you don't print duplicates except if it is the blank space. you can remove the 'deleted' note if you don't need it.

baxy

ps

also if you have any questions about the code , just shoot, example if you are not familiar with the :

($a ==1) ? (print "yes") : (print "no");
since you stated that you are new to Perl and all...

Replies are listed 'Best First'.
Re^2: delete redundant data
by Anonymous Monk on Aug 21, 2010 at 12:01 UTC
    Yes. To be honest your code seems a little intimidating to me. If you have free time, would you care to explain a little? Of course I'll google all these too. Anyway, thanks a lot for your time.