comment on

Ok, let me start off by apologizing for my other thread as an anonymous user. For some reason I thought that I was signed in, but perhaps a moderator can erase it (or have it just buried behind every other post).

Regarding my problem, I have two files; one for base position and one for region. My objective is to match every position to the corresponding region in the corresponding chromosome.

Here is one input file, testReg.txt:

chr1    100    159    0
chr1    200    260    0
chr1    500    750    0
chr3    450    700    0
chr4    100    300    0
chr7    350    600    0
chr9    100    125    0
chr11    679    687    0
chr24    100    200    0
chr24    300    400    0
[download]

where 1st column is chromosome number, 2nd column is start of region, 3rd column is end of region; and all columns are separated by tab.

Here is other input file, testPos.txt:

chr1    104    104    0    0    +
chr1    145    145    0    0    +
chr1    205    205    0    0    +
chr1    600    600    0    0    +
chr3    500    500    0    0    +
chr4    150    150    0    0    +
chr4    175    175    0    0    +
chr7    400    400    0    0    +
chr7    550    550    0    0    +
chr9    100    100    0    0    +
chr11    680    680    0    0    +
chr11    681    681    0    0    +
chr24    105    105    0    0    +
chr24    110    110    0    0    +
chr24    350    350    0    0    +
[download]

where 1st column is chromosome number, and 2nd column is the base position, with all cols separated by tab as well.

Here is my code that I've completed so far:

#!/usr/bin/perl 
use warnings; use strict; 

my $region = 'testReg.txt';
my $position = 'testPos.txt';
my $writeOut = '>>testOut.txt';

open(R,$region) or die "error reading file";
open(OUT,$writeOut) or die "error writing to the file ";
open(P, $position) or die "error reading file ";

my $rline; 
my $pline; 

while ($rline=<R>) {    
    chomp($rline);
    my @r_arr=split("\t",$rline);
        
        chomp($r_arr[0]);
        my @rID = split("r",$r_arr[0]);
        $r_arr[0] = $rID[1]; #this removes the "chr" portion of the fi
+rst element and leaves number
        #i.e. instead of [0] -> "chr24"; [0] -> "24"

    while($pline=<P>) {            
            if(!$rline) {
                last;
            } #end if
        
        chomp($pline);
        my @p_arr=split("\t",$pline);
        
            chomp($p_arr[0]);
            my @pID = split("r",$p_arr[0]);
            $p_arr[0] = $pID[1];
        
            if($p_arr[1]>$r_arr[2]) {
              $rline=<R>; 
              redo;
            } #end if
        
            else {                
                  if($p_arr[0] == $r_arr[0] && $p_arr[1] >= $r_arr[1] 
+&& $p_arr[1] <= $r_arr[2]) {
                    #NOTE: [0] element in each array now corresponds t
+o chr number
                    # r[1] is start of region and r[2] is end of regio
+n
                    # p[1] is the position of the base pair
                    shift(@p_arr);
                    print (OUT "chr$r_arr[0]\t$r_arr[1]\t$r_arr[2]\t$r
+_arr[3]\t"); 
                        print OUT join ("\t", @p_arr), "\n"; 
                        #essentially I'm joining the two files with ma
+tching lines
                        #w/ columns separated by tab
                   } #end if
            } #end else        
    } # end while <P>    
} #end while <R>
close R;
close P;
close OUT;
[download]

And below is the output that is produced by my code. As you can see, only the first 2 lines are produced and the output stops there

chr1    100    159    0    104    104    0    0    +
chr1    100    159    0    145    145    0    0    +
[download]

Below is the output that I would like my code to produce, which is basically just joining testReg.txt and testPos.txt for each match

chr1    100    159    0    chr1    104    104    0    0    +
chr1    100    159    0    chr1    145    145    0    0    +
chr1    200    260    0    chr1    205    205    0    0    +
chr1    500    750    0    chr1    600    600    0    0    +
chr3    450    700    0    chr3    500    500    0    0    +
chr4    100    300    0    chr4    150    150    0    0    +
chr4    100    300    0    chr4    175    175    0    0    +
chr7    350    600    0    chr7    400    400    0    0    +
chr7    350    600    0    chr7    550    550    0    0    +
chr9    100    125    0    chr9    100    100    0    0    +
chr11    679    687    0    chr11    680    680    0    0    +
chr11    679    687    0    chr11    681    681    0    0    +
chr22    100    200    0    chr22    105    105    0    0    +
chr22    100    200    0    chr22    110    110    0    0    +
chr22    300    400    0    chr22    350    350    0    0    +
[download]

I've tried to manipulate my code in a few ways and just end up with inconsistent results. With my current version, it is apparent that the conditional statement is correct, but it appears something is wrong with my while loops because the output stops prematurely.

Any help would be appreciated,

a217

In reply to While loop problem with filereading by a217

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.