comment on

don't think i understand what exactly you wish to delete, but if understood you correctly this is what you wish to achieve :

convert this:

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A 229    ALA       A 115    ALA~?     
A 257    ALA       A 118    ALA~?     
A 328    ASP       A  95    ASP~?

A  83    GLU       A  90    GLU^?     
A 163    ARG       A  83    ARG^?     
A 222    ARG       A   5    ARG^?

A  83    GLU       B  90    GLU^?     
A 163    ARG       B  83    ARG^?     
A 222    ARG       B  5     ARG^?
[download]

into this :


A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP
[download]

right ??

code :

#!/usr/bin/perl

use strict;

my (%hash, %hash_key);
my $x = 0;
while (<DATA>){
my @array = split(' ', $_);
$x++;
$hash{"$array[1]-$array[4]"} = $_;
$hash_key{$x} = "$array[1]-$array[4]";
}

foreach my $i (sort {$a <=> $b} keys %hash_key){
  (exists $hash{$hash_key{$i}}) ? (print "$hash{$hash_key{$i}}") : (pr
+int "deleted\n");
  delete($hash{$hash_key{$i}}) if (exists $hash{$hash_key{$i}});
}

__DATA__
A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP

A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG
[download]

baxy

UPDATE:

sorry i had to go as soon as i posted the reply (reason: girlfriend)

here is a more elegant solution. the first has some bugs and limitations due to me being in a hurry ;)

code :


#!/usr/bin/perl

use strict;

my (%hash, %hash_key); # hashes
my $x = 0;    # counters
while (<DATA>){        #starts reading the data line by line
my @array = split(' ', $_);  # split the data using spaces
$x++;                        # global counter
$hash{$array[1]}->{$array[4]} = $_;   # primary database
$hash_key{$x}= [$array[1],$array[4]]; # key database
}


foreach my $i (sort {$a <=> $b} keys %hash_key){

  (exists $hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) ? (print "$
+hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}") : (print "deleted\n"
+); # if the record in the database (hash) exists print it out otherwi
+se print 'deleted'
  next if ($hash_key{$i}->[0] eq ''); # you need the empty lines so if
+ you reached an empty line, skip the deleting part 
  delete($hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]}) if (exists $
+hash{$hash_key{$i}->[0]}->{$hash_key{$i}->[1]} || $hash{$hash_key{$i}
+->[1]}->{$hash_key{$i}->[0]}); # if you printed the entry from the da
+tabase delete it , you don't need duplicates. this goes if your recor
+d has 80 90 situation or 90 80 situation
}

__DATA__
A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A 229    ALA       A 115    ALA
A 257    ALA       A 118    ALA
A 328    ASP       A  95    ASP

A  83    GLU       A  90    GLU
A 163    ARG       A  83    ARG
A 222    ARG       A   5    ARG

A  83    GLU       B  90    GLU
A 163    ARG       B  83    ARG
A 222    ARG       B  5     ARG
[download]

so what happenes... when you think about removing a duplicates think about hashes. so first hash is the actual database that withholds all he data and second one is the database that will preserve the order. once you hash your data all you have to do is print it in the order in which you saved it using the second hash_key. the deletion that follows is there so you don't print duplicates except if it is the blank space. you can remove the 'deleted' note if you don't need it.

baxy

also if you have any questions about the code , just shoot, example if you are not familiar with the :

($a ==1) ? (print "yes") : (print "no");
[download]

since you stated that you are new to Perl and all...

In reply to Re: delete redundant data by baxy77bax
in thread delete redundant data by nurulnad

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.