in reply to Help with locating bp region in chromosome

Good start :) You should also put the data in <code></code> aka <c></c> tags

Or better yet, make it part of your program with in-memory filehandles like this

#!/usr/bin/perl -- use warnings; use strict; use autodie; Main( @ARGV ); exit( 0 ); sub Main { #~ RPSFiles( @_ ); #~ RPSFiles( 'testReg.txt', 'testPos.txt', 'testOut.txt' ); RPSDemo(); } sub RPSFiles{ my( $region, $position , $writeOut ) = @_; use autodie; # die if open/close..... fails open my $Reg, '<', $region; open my $Pos, '<', $position; open my $Out, '>', $writeOut; RegPosOut910992 ( $Reg, $Pos, $Out ); } sub RegPosOut910992 { my( $Reg, $Pos, $Out ) = @_; my $rline; my $pline; while ($rline=<$Reg>) { chomp($rline); my @r_arr=split("\t",$rline); chomp($r_arr[0]); my @rID = split("r",$r_arr[0]); $r_arr[0] = $rID[1]; #this removes the "chr" portion of th +e first element and leaves number #i.e. instead of [0] -> "chr24"; [0] -> "24" while($pline=<$Pos>) { if(!$rline) { last; } #end if chomp($pline); my @p_arr=split("\t",$pline); chomp($p_arr[0]); my @pID = split("r",$p_arr[0]); $p_arr[0] = $pID[1]; if($p_arr[1] > $r_arr[2]) { $rline=<$Reg>; redo; } #end if else { if($p_arr[0] == $r_arr[0] && $p_arr[1] >= $r_arr +[1] && $p_arr[1] <= $r_arr[2]) { #NOTE: [0] element in each array now correspon +ds to chr number # r[1] is start of region and r[2] is end of r +egion # p[1] is the position of the base pair shift(@p_arr); print ($Out "chr$r_arr[0]\t$r_arr[1]\t$r_arr[2 +]\t$r_arr[3]\t"); print $Out join ("\t", @p_arr), "\n"; #essentially I'm joining the two files wit +h matching lines #w/ columns separated by tab } #end if } #end else } # end while <$Pos> } #end while <$Reg> close $Reg; close $Pos; close $Out; } sub RPSDemo { my $region = <<'__REGION__'; chr1 400 500 0 0 + chr1 600 700 0 0 + chr3 200 225 0 0 + chr4 650 700 0 0 + chr7 100 120 0 0 + chr7 300 400 0 0 + __REGION__ my $position = <<'__POSITION__'; chr1 415 0 0 + chr1 600 0 0 + chr3 205 0 0 + chr4 681 0 0 + chr7 110 0 0 + chr7 350 0 0 + __POSITION__ open my $Reg, '<', \$region; open my $Pos, '<', \$position; RegPosOut910992 ( $Reg, $Pos, \*STDOUT ); }
The essence of your program and your problem is unchanged, except now its confined in sub RegPosOut910992.

The next step you should take is to replace @p_arr with meaningful variable names, say $Chomosomes, $StartOfRegion, $EndOfRegion...

Also, the data you presented contains no tabs, so split on whitespace

Next problem, filehandles are iterators. Once advance the iterator, once you reach the end, you're always at the end, unless you rewind the iterator. You can rewind filehandles with seek.

Replies are listed 'Best First'.
Re^2: Help with locating bp region in chromosome
by Anonymous Monk on Jun 23, 2011 at 15:55 UTC

    Sorry about that, but here is the data input as well as the output for my code.

    Below is pos.txt, or where the position of the bp is located (1st col chromosome, 2nd col position).

    chr1 104 104 0 0 + chr1 145 145 0 0 + chr1 205 205 0 0 + chr1 600 600 0 0 + chr3 500 500 0 0 + chr4 150 150 0 0 + chr4 175 175 0 0 + chr7 400 400 0 0 + chr7 550 550 0 0 + chr9 100 100 0 0 + chr11 680 680 0 0 + chr11 681 681 0 0 + chr22 105 105 0 0 + chr22 110 110 0 0 + chr22 350 350 0 0 +

    Below is reg.txt, or where the region is located (1st col chromosome, 2nd col start of region, 3rd col end of region).

    chr1 100 159 0 chr1 200 260 0 chr1 500 750 0 chr3 450 700 0 chr4 100 300 0 chr7 350 600 0 chr9 100 125 0 chr11 679 687 0 chr22 100 200 0 chr22 300 400 0

    Below is my output, where first 4 col are from reg.txt and last 5 are from pos.txt. As you can see, my code only correctly outputs answers for part of the first chromosome, and it does not continue past that. This is the main problem I face, to understand how I can get a loop to cover all cases.

    chr1 100 159 0 104 104 0 0 + chr1 100 159 0 145 145 0 0 +
      The code you posted produces no output for me (try downloading the code you posted yourself).

      You should also post the exact output you expect to get.

        I did download the code I posted, and it worked for me because I saved the input files as testReg.txt and testPos.txt respectively instead of reg.txt and pos.txt. Sorry about that. The beginning of the code is where I specify the file names.

        The correct output should look like:

        chr1 100 159 0 chr1 104 104 0 0 + chr1 100 159 0 chr1 145 145 0 0 + chr1 200 260 0 chr1 205 205 0 0 + chr1 500 750 0 chr1 600 600 0 0 + chr3 450 700 0 chr3 500 500 0 0 + chr4 100 300 0 chr4 150 150 0 0 + chr4 100 300 0 chr4 175 175 0 0 + chr7 350 600 0 chr7 400 400 0 0 + chr7 350 600 0 chr7 550 550 0 0 + chr9 100 125 0 chr9 100 100 0 0 + chr11 679 687 0 chr11 680 680 0 0 + chr11 679 687 0 chr11 681 681 0 0 + chr22 100 200 0 chr22 105 105 0 0 + chr22 100 200 0 chr22 110 110 0 0 + chr22 300 400 0 chr22 350 350 0 0 +

        but as in my previous reply I can only get the first to lines to output at all (let alone correctly). Again, first 4 columns are from testReg.txt and last 6 columns are from testPos.txt. The "chr" column from testPos.txt isn't necessary in the output, but I was trying to include it as well.

        The Re^4 post should include my reply if you click on the title, but for some reason it doesn't display in the original thread