Hello, I am relatively new to perl code and I have what should be a simple question for the perl monks. Basically my problem is that I have to match a base position to the corresponding region in the corresponding chromosome. I have two input files: a file for the base position that looks like
pos.txt
chr1 415 0 0 +
chr1 600 0 0 +
chr3 205 0 0 +
chr4 681 0 0 +
chr7 110 0 0 +
chr7 350 0 0 +
where the 1st col. is the chromosome and 2nd col. is the position. The cols. are separated by tabs as well.
The second file is reg.txt:
chr1 400 500 0 0 +
chr1 600 700 0 0 +
chr3 200 225 0 0 +
chr4 650 700 0 0 +
chr7 100 120 0 0 +
chr7 300 400 0 0 +
where 1st col. is chromosome, 2nd col. is start of region, and 3rd col. is end of region. Separated by tabs as well.
Essentially, I have to find the region in reg.txt for the corresponding position in pos.txt
Here is the code I'm working on:
#!/usr/bin/perl use warnings; use strict; my $region = 'testReg.txt'; my $position = 'testPos.txt'; my $writeOut = '>>testOut.txt'; open(R,$region) or die "error reading file"; open(OUT,$writeOut) or die "error writing to the file "; open(P, $position) or die "error reading file "; my $rline; my $pline; while ($rline=<R>) { chomp($rline); my @r_arr=split("\t",$rline); chomp($r_arr[0]); my @rID = split("r",$r_arr[0]); $r_arr[0] = $rID[1]; #this removes the "chr" portion of the fi +rst element and leaves number #i.e. instead of [0] -> "chr24"; [0] -> "24" while($pline=<P>) { if(!$rline) { last; } #end if chomp($pline); my @p_arr=split("\t",$pline); chomp($p_arr[0]); my @pID = split("r",$p_arr[0]); $p_arr[0] = $pID[1]; if($p_arr[1]>$r_arr[2]) { $rline=<R>; redo; } #end if else { if($p_arr[0] == $r_arr[0] && $p_arr[1] >= $r_arr[1] +&& $p_arr[1] <= $r_arr[2]) { #NOTE: [0] element in each array now corresponds t +o chr number # r[1] is start of region and r[2] is end of regio +n # p[1] is the position of the base pair shift(@p_arr); print (OUT "chr$r_arr[0]\t$r_arr[1]\t$r_arr[2]\t$r +_arr[3]\t"); print OUT join ("\t", @p_arr), "\n"; #essentially I'm joining the two files with ma +tching lines #w/ columns separated by tab } #end if } #end else } # end while <P> } #end while <R> close R; close P; close OUT;
I have working code that can find a region for a position in one chromosome, but for multiple chromosomes I am having a difficult time getting an answer. My problem seems to be the loops as the code only outputs answers for the first chromosome (i.e. chr1). Any help would be appreciated, a217
In reply to Help with locating bp region in chromosome by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |