comment on

I gather that the perl code was added to the OP after the previous replies were posted. Now that we have both the data and the code (and the intent is sort of clear, maybe), it looks like you've got the wrong logic for the task.

Your "for" loop on the contents of File2 is taking only one line at a time, but it looks like the data is supposed to be handled one record at a time, where a record is the concatenation of lines of protein letters. You can't use an index value of 200 or 300 (characters) when your loop is only seeing 60 or 70 characters (one line) at a time.

What you should be doing instead is reading all of File2 into another hash, also keyed by the ID numbers (just like the hash loaded from File1); then you can loop over the keys from File1, and do what needs to be done with the full-record data from File2.

Since the actual specs for your task are not entirely clear to me, I have no way of knowing whether the following version of your code will do what needs to be done, but at least it will show you how you should be handling your second input file, and how you should be checking for (and reporting) conditions that go against expectations.

#!/usr/bin/perl

use strict;
use warnings;

my $qfn1 = "File1.txt";
my $qfn2 = "File2.txt";

my %positions;
{
   open(my $fh, '<', $qfn1)
      or die("Cannot open file \"$qfn1\": $!\n");

   while (<$fh>) {
      my ($key, $pos) = split /\s+/;
      $positions{$key} = $pos;
   }
}

my %sequences;
{
   open(my $fh, '<', $qfn2)
      or die("Cannot open file \"$qfn2\": $!\n");

   my $key;
   while (<$fh>) {
       if ( s/^>// ) {
           $key = ( split /\|/ )[1];
       }
       else {
           chomp;
           $sequences{$key} .= $_;
       }
   }
}

for my $key ( sort {$a<=>$b} keys %positions ) {
    if ( ! exists( $sequences{$key} )) {
        warn "KEY $key in File1 not found in File2\n";
        next;
    }
    if ( length( $sequences{$key} ) < $positions{$key} ) {
        warn "KEY $key: File2 string length too short for File1 positi
+on value\n";
        next;
    }
    my $index = rindex( $sequences{$key}, "ATG", $positions{$key} );
    if ( $index < 0 ) {
        warn sprintf( "KEY %s: No ATG in File2 string prior to positio
+n %d\n",
                      $key, $positions{$key} );
        next;
    }
    $index += 3 while ( ($index + 3) < $positions{$key} );
    print "$key $positions{$key} " . substr($sequences{$key}, $index, 
+3) . "\n";
}
[download]

For the data files shown in the OP, that version prints:

255369268 300 ACT
269212695 200 TAT
[download]

How would you confirm that this is what it should be printing?

In reply to Re: Howto correct my program -- help by graff
in thread Debug help by ashnator

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.