Hi PerlMonks,
There are four samples (sample1...sample4) each with a different name. Sample 1 & 2 have the same sequence i.e. ATGC. Likewise, sample 3 & 4 have same sequence i.e. CCGG. My interest is to retain only the sample 1 with sequence ATGC and reject sample 2 as the latter shares the same sequence with sample 1. Same is the case for sample 3 & sample 4 i.e. I wish to retain sample 3 and reject sample 4. I am at my wit's end to fix this problem. I am looking forward to suggestions from perl monks regarding this problem.
I have written a script t2.pl (given below) to separate the header and the sequence. Here goes the script:
#!/usr/bin/perl use warnings; use strict; my $a=">sample1 ..sequence ATGC fun >sample2 ..sequence ATGC fun >sample3 ..sequence CCGG fun >sample4 ..sequence CCGG fun"; while ($a=~ />.*?fun/gs) {my $trial1=$&; my $trial2=$&; while ($trial1=~ />.*sequence/gs) {my $header=$&; $trial2=~ s/($header)//gs; my $seq=$trial2; $seq=~ s/\s//; $seq=~ s/fun//; print "\n Header: $header Sequence: $seq\n"; } } # code?? exit;
I have got the results like:
C:\Users\x\Desktop>t2.pl Header: >sample1 ..sequence Sequence: ATGC Header: >sample2 ..sequence Sequence: ATGC Header: >sample3 ..sequence Sequence: CCGG Header: >sample4 ..sequence Sequence: CCGG
But the expected results should look like:
>sample1 ..sequence ATGC >sample3 ..sequence CCGG
In reply to How does one get only the non-redundant (non-repeating) entries with header? by supriyoch_2008
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |