comment on

Greetings Monks, I’m a newbie to the monastery and have what on the surface appears to be a very easy question, which has had me flummoxed for over a day so I decided to come to the light side to seek wisdom! I am trying to parse out results from a program that generates an output file like the one below.

Results:1582 1640 6 9.8 6 90 0 69 55 16 13 13 1.68 GACAAT GACAATGACAAT
+GACAATGACAATGACAGAGACAGTAACAATAACAATAACAATAACAA
"Results:5184 5214 6 5.2 6 96 0 55 16 0 45 38 1.47 TGGTGA TGGTGATGGTGA
+TGGTGATGGTGATGTTGAT";
[download]

The problem is that the code to take out lines beginning with “R “ and place results into arrays i have written to do this seems to skip either 1, 2, or 3 Results lines depending on how it feels! therfore out of 137 results lines it only ever picks out 69 or 74 lines. A section of the code is below from a larger program that i wrote to do the job, hence the commented out sections.

"TRID=0;
$SEQID=0;
#$PID=0;
$i=0;
#$line=<TR_INFILE>;
chomp $line;


while ($line =<TR_INFILE>) {
  
    if ($line =~/^R.*/) {
      $line=~s/^Results://g;
      
      #print "making TR arrays\n";
      print OUTFILE3 "$line";
      
      $trstart[$i] =  (split(/\s*/,$line))[0];
      $trend[$i] =    (split(/\s*/,$line))[1];
      $period[$i] =   (split(/\s*/,$line))[2];
      $copy[$i] =     (split(/\s*/,$line))[3];
      $consize[$i] =  (split(/\s*/,$line))[4];
      $matches[$i] =  (split(/\s*/,$line))[5];
      $indels[$i] =   (split(/\s*/,$line))[6];
      $score[$i] =    (split(/\s*/,$line))[7];
      $numa[$i] =     (split(/\s*/,$line))[8];
      $numc[$i] =     (split(/\s*/,$line))[9];
      $numg[$i] =     (split(/\s*/,$line))[10];
      $numt[$i] =     (split(/\s*/,$line))[11];
      $entropy[$i] =  (split(/\s*/,$line))[12];
      #$TR_consensus[$i]= (split(/\s*/,$line))[13];
      #$TR_sequence[$i]=  (split(/\s*/,$line))[14];
      $TRID++;
   }
   # elsif ($line =~/^P.*/){ 
   # print "Making Parameter  arrays\n";
   # $line =~s/\s/\./g;
   # $line =~s/^Parameters:\.//g;
   # $trparameters[$i] = ($line)[0];
   # $PID++;
  # } 
    elsif ($line =~ /^S.*/) {
     # print "Making seqeunce arrays \n";
      $line =~s/^Sequence:\s*//;
      $TR_Accession[$i] =  ($line)[0];
      $SEQID++;
      }
    else {
      }

    $i++;
    $line=<TR_INFILE>;
    chomp $line;
}
close TR_INFILE;"
[download]

I will be grateful for all advice! i am sure it has something to do withthe RegEx.Apologies for the bad layout. Thank you in advance, PC.

In reply to RegEx misbehaving? by pdotcdot

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.