comment on

Hello, I am relatively new to perl code and I have what should be a simple question for the perl monks. Basically my problem is that I have to match a base position to the corresponding region in the corresponding chromosome. I have two input files: a file for the base position that looks like

pos.txt

chr1 415 0 0 +

chr1 600 0 0 +

chr3 205 0 0 +

chr4 681 0 0 +

chr7 110 0 0 +

chr7 350 0 0 +

where the 1st col. is the chromosome and 2nd col. is the position. The cols. are separated by tabs as well.

The second file is reg.txt:

chr1 400 500 0 0 +

chr1 600 700 0 0 +

chr3 200 225 0 0 +

chr4 650 700 0 0 +

chr7 100 120 0 0 +

chr7 300 400 0 0 +

where 1st col. is chromosome, 2nd col. is start of region, and 3rd col. is end of region. Separated by tabs as well.

Essentially, I have to find the region in reg.txt for the corresponding position in pos.txt

Here is the code I'm working on:

#!/usr/bin/perl 
use warnings; use strict; 

my $region = 'testReg.txt';
my $position = 'testPos.txt';
my $writeOut = '>>testOut.txt';

open(R,$region) or die "error reading file";
open(OUT,$writeOut) or die "error writing to the file ";
open(P, $position) or die "error reading file ";

my $rline; 
my $pline; 

while ($rline=<R>) {    
    chomp($rline);
    my @r_arr=split("\t",$rline);
        
        chomp($r_arr[0]);
        my @rID = split("r",$r_arr[0]);
        $r_arr[0] = $rID[1]; #this removes the "chr" portion of the fi
+rst element and leaves number
        #i.e. instead of [0] -> "chr24"; [0] -> "24"

    while($pline=<P>) {            
            if(!$rline) {
                last;
            } #end if
        
        chomp($pline);
        my @p_arr=split("\t",$pline);
        
            chomp($p_arr[0]);
            my @pID = split("r",$p_arr[0]);
            $p_arr[0] = $pID[1];
        
            if($p_arr[1]>$r_arr[2]) {
              $rline=<R>; 
              redo;
            } #end if
        
            else {                
                  if($p_arr[0] == $r_arr[0] && $p_arr[1] >= $r_arr[1] 
+&& $p_arr[1] <= $r_arr[2]) {
                    #NOTE: [0] element in each array now corresponds t
+o chr number
                    # r[1] is start of region and r[2] is end of regio
+n
                    # p[1] is the position of the base pair
                    shift(@p_arr);
                    print (OUT "chr$r_arr[0]\t$r_arr[1]\t$r_arr[2]\t$r
+_arr[3]\t"); 
                        print OUT join ("\t", @p_arr), "\n"; 
                        #essentially I'm joining the two files with ma
+tching lines
                        #w/ columns separated by tab
                   } #end if
            } #end else        
    } # end while <P>    
} #end while <R>
close R;
close P;
close OUT;
[download]

I have working code that can find a region for a position in one chromosome, but for multiple chromosomes I am having a difficult time getting an answer. My problem seems to be the loops as the code only outputs answers for the first chromosome (i.e. chr1). Any help would be appreciated, a217

In reply to Help with locating bp region in chromosome by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.