Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello, I am relatively new to perl code and I have what should be a simple question for the perl monks. Basically my problem is that I have to match a base position to the corresponding region in the corresponding chromosome. I have two input files: a file for the base position that looks like
pos.txt
chr1 415 0 0 +
chr1 600 0 0 +
chr3 205 0 0 +
chr4 681 0 0 +
chr7 110 0 0 +
chr7 350 0 0 +
where the 1st col. is the chromosome and 2nd col. is the position. The cols. are separated by tabs as well.
The second file is reg.txt:
chr1 400 500 0 0 +
chr1 600 700 0 0 +
chr3 200 225 0 0 +
chr4 650 700 0 0 +
chr7 100 120 0 0 +
chr7 300 400 0 0 +
where 1st col. is chromosome, 2nd col. is start of region, and 3rd col. is end of region. Separated by tabs as well.
Essentially, I have to find the region in reg.txt for the corresponding position in pos.txt
Here is the code I'm working on:
#!/usr/bin/perl use warnings; use strict; my $region = 'testReg.txt'; my $position = 'testPos.txt'; my $writeOut = '>>testOut.txt'; open(R,$region) or die "error reading file"; open(OUT,$writeOut) or die "error writing to the file "; open(P, $position) or die "error reading file "; my $rline; my $pline; while ($rline=<R>) { chomp($rline); my @r_arr=split("\t",$rline); chomp($r_arr[0]); my @rID = split("r",$r_arr[0]); $r_arr[0] = $rID[1]; #this removes the "chr" portion of the fi +rst element and leaves number #i.e. instead of [0] -> "chr24"; [0] -> "24" while($pline=<P>) { if(!$rline) { last; } #end if chomp($pline); my @p_arr=split("\t",$pline); chomp($p_arr[0]); my @pID = split("r",$p_arr[0]); $p_arr[0] = $pID[1]; if($p_arr[1]>$r_arr[2]) { $rline=<R>; redo; } #end if else { if($p_arr[0] == $r_arr[0] && $p_arr[1] >= $r_arr[1] +&& $p_arr[1] <= $r_arr[2]) { #NOTE: [0] element in each array now corresponds t +o chr number # r[1] is start of region and r[2] is end of regio +n # p[1] is the position of the base pair shift(@p_arr); print (OUT "chr$r_arr[0]\t$r_arr[1]\t$r_arr[2]\t$r +_arr[3]\t"); print OUT join ("\t", @p_arr), "\n"; #essentially I'm joining the two files with ma +tching lines #w/ columns separated by tab } #end if } #end else } # end while <P> } #end while <R> close R; close P; close OUT;
I have working code that can find a region for a position in one chromosome, but for multiple chromosomes I am having a difficult time getting an answer. My problem seems to be the loops as the code only outputs answers for the first chromosome (i.e. chr1). Any help would be appreciated, a217
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Help with locating bp region in chromosome
by Anonymous Monk on Jun 23, 2011 at 02:27 UTC | |
by Anonymous Monk on Jun 23, 2011 at 15:55 UTC | |
by toolic (Bishop) on Jun 23, 2011 at 16:28 UTC | |
by Anonymous Monk on Jun 23, 2011 at 17:09 UTC | |
by Anonymous Monk on Jun 23, 2011 at 17:12 UTC | |
by Anonymous Monk on Jun 25, 2011 at 13:26 UTC | |
| |
by Anonymous Monk on Jun 24, 2011 at 02:07 UTC |