The posting appears to have been updated considerably... and the specification has shifted. Plus ça change :-(
OK. I give up. I really hope this is something useful and not just homework. The program below gives:
>s1: A G C T T T T C G G G C A A T
23 43 45 65 76 54 3 34 54 65 7 45 56 87 56
-> A G C T T T C G G C A A T -- dropped 6 10
23 43 45 65 76 54 34 54 65 45 56 87 56
>s2: G C T G C C C C C C A T C T T
23 43 23 45 65 45 76 78 34 8 12 32 65 23 25
-> G C T G C C C C C A T C T T -- dropped 9
23 43 23 45 65 45 76 78 34 12 32 65 23 25
>s3: T C G T A G C T G A A A A T C
12 23 34 45 56 54 43 32 65 43 12 34 75 76 45
-> T C G T A G C T G A A A A T C -- all kept
12 23 34 45 56 54 43 32 65 43 12 34 75 76 45
#!/usr/bin/perl use strict; use warnings; my $seq_txt = <<FILE1; >s1 AGCTTTTCGGGCAAT >s2 GCTGCCCCCCATCTT >s3 TCGTAGCTGAAAATC FILE1 my $num_txt = <<FILE2; >s1 23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 8 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 FILE2 open(my $FH1, '<', \$seq_txt) or die "$!" ; open(my $FH2, '<', \$num_txt) or die "$!" ; while (!eof($FH1)) { # Fetch sequence numbers and make sure we are in sync # # For tidyness, trim all leading and trailing whitespace -- which al +so takes care of # the newline. # # Skip blank lines in both files -- mostly so we're not foxed by tra +iling blankness. my $sid_1 = <$FH1> ; $sid_1 =~ s/^\s*// ; $sid_1 =~ s/\s*$// ; next if ($sid_1 eq '') ; # Comfortable with blank lines in file 1 my $sid_2 = '' ; while ($sid_2 eq '') { # Allow blank lines in file 2 also defined($sid_2 = <$FH2>) or die "file 2 -- hit eof expecting '$si +d_1'" ; $sid_2 =~ s/^\s*// ; $sid_2 =~ s/\s*$// ; } ; if ($sid_1 ne $sid_2) { die "File 1 sequence '$sid_1' ne file 2 '$sid_2'" ; } ; # Fetch the sequence -- discard all whitespace and check validity (m +ay not be empty) defined(my $seq = <$FH1>) or die "file 1 -- unexpected eof" ; $seq =~ s/\s+//g ; $seq =~ m/^[ACGT]+$/ or die "file 1 -- invalid sequence" ; my @seq = split(//, $seq) ; # Fetch the scores -- check valid form and length defined(my $num = <$FH2>) or die "file 2 -- unexpected eof" ; $num =~ s/^\s*// ; $num =~ s/\s*$// ; $num =~ m/^(?:\d+(?:\s+|\z))*$/ or die "file 2 -- invalid scores" ; my @num = split(/\s+/, $num) ; if (@num != @seq) { die "sequence $sid_1: sequence and scores don't +match" ; } ; # We pass everything except letters (and the related score) where th +e letter is: # # a. last of repeated sequence of 3 or more. # # b. its score is < 10 my @exam = map { ($num[$_] > 10) # must be OK if scor +e > 10 || ($_ < 2) # too early for repe +at of 3 or more ! || ($seq[$_] ne $seq[$_-1]) # not a repeat || ($seq[$_] ne $seq[$_-2]) # not a long enough +repeat || ( ($_ != $#seq) && ($seq[$_] eq $seq[$_+1]) ) # not the end of a r +epeat } (0..$#seq) ; my @keep = grep { $exam[$_] } (0..$#seq) ; my @drop = grep { !$exam[$_] } (0..$#seq) ; # Crunch printf "%6s%s\n", "$sid_1:", show_seq(@seq) ; printf "%6s%s\n", "" , show_num(@num) ; printf "%6s%s", "->", show_seq(@seq[@keep]) ; if (@drop) { print " -- dropped @drop\n" ; } else { print " -- all kept\n" ; } ; printf "%6s%s\n", "", show_num(@num[@keep]) ; $seq = join(' ', @seq[@keep]) ; $num = join(' ', @num[@keep]) ; # Now do whatever's required with the results .... } ; # Make sure that the two files have finished together. while (<$FH2>) { m/^\s*$/ or die "File 2 contains more than file 1" ; } ; #_____________________________________________________________________ +_ sub show_seq { return ' '.join(' ', @_) ; } ; sub show_num { return join('', map sprintf('%3d', $_), @_) ; } ;
In reply to Re: deleting a particular character and its coresponding score in 2 different files!!!
by gone2015
in thread deleting a particular character and its coresponding score in 2 different files!!!
by heidi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |