Re: Howto correct my program -- help

I gather that the perl code was added to the OP after the previous replies were posted. Now that we have both the data and the code (and the intent is sort of clear, maybe), it looks like you've got the wrong logic for the task.

Your "for" loop on the contents of File2 is taking only one line at a time, but it looks like the data is supposed to be handled one record at a time, where a record is the concatenation of lines of protein letters. You can't use an index value of 200 or 300 (characters) when your loop is only seeing 60 or 70 characters (one line) at a time.

What you should be doing instead is reading all of File2 into another hash, also keyed by the ID numbers (just like the hash loaded from File1); then you can loop over the keys from File1, and do what needs to be done with the full-record data from File2.

Since the actual specs for your task are not entirely clear to me, I have no way of knowing whether the following version of your code will do what needs to be done, but at least it will show you how you should be handling your second input file, and how you should be checking for (and reporting) conditions that go against expectations.

#!/usr/bin/perl

use strict;
use warnings;

my $qfn1 = "File1.txt";
my $qfn2 = "File2.txt";

my %positions;
{
   open(my $fh, '<', $qfn1)
      or die("Cannot open file \"$qfn1\": $!\n");

   while (<$fh>) {
      my ($key, $pos) = split /\s+/;
      $positions{$key} = $pos;
   }
}

my %sequences;
{
   open(my $fh, '<', $qfn2)
      or die("Cannot open file \"$qfn2\": $!\n");

   my $key;
   while (<$fh>) {
       if ( s/^>// ) {
           $key = ( split /\|/ )[1];
       }
       else {
           chomp;
           $sequences{$key} .= $_;
       }
   }
}

for my $key ( sort {$a<=>$b} keys %positions ) {
    if ( ! exists( $sequences{$key} )) {
        warn "KEY $key in File1 not found in File2\n";
        next;
    }
    if ( length( $sequences{$key} ) < $positions{$key} ) {
        warn "KEY $key: File2 string length too short for File1 positi
+on value\n";
        next;
    }
    my $index = rindex( $sequences{$key}, "ATG", $positions{$key} );
    if ( $index < 0 ) {
        warn sprintf( "KEY %s: No ATG in File2 string prior to positio
+n %d\n",
                      $key, $positions{$key} );
        next;
    }
    $index += 3 while ( ($index + 3) < $positions{$key} );
    print "$key $positions{$key} " . substr($sequences{$key}, $index, 
+3) . "\n";
}
[download]

For the data files shown in the OP, that version prints:

255369268 300 ACT
269212695 200 TAT
[download]

How would you confirm that this is what it should be printing?

Comment on Re: Howto correct my program -- help Select or Download Code

Replies are listed 'Best First'.
Re^2: Howto correct my program -- help by ashnator (Sexton) on Dec 22, 2008 at 00:49 UTC
Thanks a lot. I promise I will be better from next time	[reply]