supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:
Hi PerlMonks,
There are four samples (sample1...sample4) each with a different name. Sample 1 & 2 have the same sequence i.e. ATGC. Likewise, sample 3 & 4 have same sequence i.e. CCGG. My interest is to retain only the sample 1 with sequence ATGC and reject sample 2 as the latter shares the same sequence with sample 1. Same is the case for sample 3 & sample 4 i.e. I wish to retain sample 3 and reject sample 4. I am at my wit's end to fix this problem. I am looking forward to suggestions from perl monks regarding this problem.
I have written a script t2.pl (given below) to separate the header and the sequence. Here goes the script:
#!/usr/bin/perl use warnings; use strict; my $a=">sample1 ..sequence ATGC fun >sample2 ..sequence ATGC fun >sample3 ..sequence CCGG fun >sample4 ..sequence CCGG fun"; while ($a=~ />.*?fun/gs) {my $trial1=$&; my $trial2=$&; while ($trial1=~ />.*sequence/gs) {my $header=$&; $trial2=~ s/($header)//gs; my $seq=$trial2; $seq=~ s/\s//; $seq=~ s/fun//; print "\n Header: $header Sequence: $seq\n"; } } # code?? exit;
I have got the results like:
C:\Users\x\Desktop>t2.pl Header: >sample1 ..sequence Sequence: ATGC Header: >sample2 ..sequence Sequence: ATGC Header: >sample3 ..sequence Sequence: CCGG Header: >sample4 ..sequence Sequence: CCGG
But the expected results should look like:
>sample1 ..sequence ATGC >sample3 ..sequence CCGG
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How does one get only the non-redundant (non-repeating) entries with header?
by roboticus (Chancellor) on Jul 17, 2014 at 11:37 UTC | |
by supriyoch_2008 (Monk) on Jul 19, 2014 at 09:14 UTC | |
|
Re: How does one get only the non-redundant (non-repeating) entries with header?
by ww (Archbishop) on Jul 17, 2014 at 12:45 UTC | |
by supriyoch_2008 (Monk) on Jul 19, 2014 at 09:18 UTC |