comment on

The posting appears to have been updated considerably... and the specification has shifted. Plus ça change :-(

OK. I give up. I really hope this is something useful and not just homework. The program below gives:

  >s1:  A  G  C  T  T  T  T  C  G  G  G  C  A  A  T
       23 43 45 65 76 54  3 34 54 65  7 45 56 87 56
    ->  A  G  C  T  T  T  C  G  G  C  A  A  T -- dropped 6 10
       23 43 45 65 76 54 34 54 65 45 56 87 56
  >s2:  G  C  T  G  C  C  C  C  C  C  A  T  C  T  T
       23 43 23 45 65 45 76 78 34  8 12 32 65 23 25
    ->  G  C  T  G  C  C  C  C  C  A  T  C  T  T -- dropped 9
       23 43 23 45 65 45 76 78 34 12 32 65 23 25
  >s3:  T  C  G  T  A  G  C  T  G  A  A  A  A  T  C
       12 23 34 45 56 54 43 32 65 43 12 34 75 76 45
    ->  T  C  G  T  A  G  C  T  G  A  A  A  A  T  C -- all kept
       12 23 34 45 56 54 43 32 65 43 12 34 75 76 45

#!/usr/bin/perl 
use strict; 
use warnings; 

my $seq_txt = <<FILE1;
>s1
AGCTTTTCGGGCAAT
>s2
GCTGCCCCCCATCTT
>s3
TCGTAGCTGAAAATC
FILE1

my $num_txt = <<FILE2;
>s1
23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 
>s2
23 43 23 45 65 45 76 78 34 8 12 32 65 23 25
>s3
12 23 34 45 56 54 43 32 65 43 12 34 75 76 45
FILE2

open(my $FH1, '<', \$seq_txt) or die "$!" ; 
open(my $FH2, '<', \$num_txt) or die "$!" ; 

while (!eof($FH1)) {

  # Fetch sequence numbers and make sure we are in sync
  #
  # For tidyness, trim all leading and trailing whitespace -- which al
+so takes care of
  # the newline.
  #
  # Skip blank lines in both files -- mostly so we're not foxed by tra
+iling blankness.

  my $sid_1 = <$FH1> ;
  $sid_1 =~ s/^\s*// ; $sid_1 =~ s/\s*$// ;

  next if ($sid_1 eq '') ;    # Comfortable with blank lines in file 1

  my $sid_2 = '' ;
  while ($sid_2 eq '') {    # Allow blank lines in file 2 also
    defined($sid_2 = <$FH2>)  or die "file 2 -- hit eof expecting '$si
+d_1'" ;
    $sid_2 =~ s/^\s*// ; $sid_2 =~ s/\s*$// ;
  } ;

  if ($sid_1 ne $sid_2)
    { die "File 1 sequence '$sid_1' ne file 2 '$sid_2'" ; } ;

  # Fetch the sequence -- discard all whitespace and check validity (m
+ay not be empty)

  defined(my $seq = <$FH1>)  or die "file 1 -- unexpected eof" ;
  $seq =~ s/\s+//g ;

  $seq =~ m/^[ACGT]+$/       or die "file 1 -- invalid sequence" ;

  my @seq = split(//, $seq) ;

  # Fetch the scores -- check valid form and length

  defined(my $num = <$FH2>)  or die "file 2 -- unexpected eof" ;
  $num =~ s/^\s*// ; $num =~ s/\s*$// ;
  $num =~ m/^(?:\d+(?:\s+|\z))*$/  or die "file 2 -- invalid scores" ;

  my @num = split(/\s+/, $num) ;

  if (@num != @seq) { die "sequence $sid_1: sequence and scores don't 
+match" ; } ;

  # We pass everything except letters (and the related score) where th
+e letter is:
  #
  #    a. last of repeated sequence of 3 or more.
  #
  #    b. its score is < 10 

  my @exam = map {    ($num[$_] > 10)             # must be OK if scor
+e > 10
                   || ($_ < 2)                    # too early for repe
+at of 3 or more !
                   || ($seq[$_] ne $seq[$_-1])    # not a repeat
                   || ($seq[$_] ne $seq[$_-2])    # not a long enough 
+repeat
                   || ( ($_ != $#seq) &&
                       ($seq[$_] eq $seq[$_+1]) ) # not the end of a r
+epeat
                 } (0..$#seq) ;

  my @keep = grep {  $exam[$_] } (0..$#seq) ;
  my @drop = grep { !$exam[$_] } (0..$#seq) ;

  # Crunch

  printf "%6s%s\n", "$sid_1:", show_seq(@seq) ;
  printf "%6s%s\n", ""       , show_num(@num) ;
  printf "%6s%s",   "->",      show_seq(@seq[@keep]) ;
  if (@drop) { print " -- dropped @drop\n" ; }
        else { print " -- all kept\n" ;      } ;
  printf "%6s%s\n", "",        show_num(@num[@keep]) ;

  $seq = join(' ', @seq[@keep]) ;
  $num = join(' ', @num[@keep]) ;

  # Now do whatever's required with the results ....

} ;

# Make sure that the two files have finished together.

while (<$FH2>) {
  m/^\s*$/ or die "File 2 contains more than file 1" ;
} ;

#_____________________________________________________________________
+_

sub show_seq { return '  '.join('  ', @_) ; } ;

sub show_num { return join('', map sprintf('%3d', $_), @_) ; } ;
[download]

In reply to Re: deleting a particular character and its coresponding score in 2 different files!!! by gone2015
in thread deleting a particular character and its coresponding score in 2 different files!!! by heidi

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.