Hi Monks,
So I know many monks here aren't biologist but I have posted my question on bioinfo forums aswell but haven't been able to solve the issue. Since it involves perl I thought maybe you all will be able to point out something in the code.
My question, I am trying to find overlap between CNV(a variant basically) which has genomic region(start and stop basepair position), eg: 23487 72875 and genes which also have this (start and stop bp position) eg: BRCA1 41196312 41277500. Overlap can be in different ways: i). either there wont be any overlap i.e the gene region will not fall in the CNV region or ii). it will fall completely within the region or iii) it may fall partially i.e gene region might overlap with the CNV start position or gene region overlapping with the stop postion of the CNV. So if there will be an overlap it will return 1 and if not then it will return 0 and the output file finally will have all the overlap and the gene name with which this cnv region is overlapping. So I am reading in the main CNV file and two other gene list files.
Three tab delim files I have are
file1=cnv file
start stop size(in bp)
5389769 98256008 3567
7452344 871875466 64547
file2=genelist1
name start stop
BRCA1 41196312 41277500
TP53 7571720 7590863
file3=genelist3
name start stop
OMG 29621668 29624380
NR3C1 142657496..142815077
note: I cannot merge the two gene lists as they are different sets and need to be kept separate.
Am I doing this correctly? Any suggestions?
!/usr/bin/perl
#Open the cnv file with start, stop
my $input_gene_file = "path to file";
die "Cannot open $input_gene_file\n" unless (open(IN, $input_gene_
+file));
my %size_CNV_start_CNV_stop;
while (chomp($line = <IN>))
{
my (@columns) = split /\s+/, $line;
my $size = $columns[3];
my $CNV_start = $columns[1];
my $CNV_stop = $columns[2];
$size_and_CNV_start{$size} = $CNV_start;
$size_and_CNV_stop{$size} = $CNV_stop;
}
close(IN);
#Now open files for genelist1 and genelist2 with respective start
+and stops
open (MYFILE, "/Users/pathtofile");
@file1 = <MYFILE>;
close MYFILE;
open (MYFILE2, "/Users/pathtofile");
@file2 = <MYFILE2>;
close MYFILE2;
#Open output file and write the locations/position of each symbol
die "output.txt" unless(open( OUT,"> output.txt"));
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)){
sub overlap {
my ($CNV_start, $CNV_stop, $Gene_start, $Gene_stop) = @_;
if ($Gene_stop < $CNV_start || $Gene_start < $CNV_stop) {
return 0;
if (Gene_stop > $CNV_start || $Gene_stop > $CNV_stop){
}
return 1;
}
while (defined($symbol = <IN>))
{
chomp($symbol);
{
my $CNV_start = $size_and_CNV_start{$size};
my $CNV_stop = $size_and_CNV_stop{$size};
print OUT " $name\ $size\ $CNV_start\ $CNV_stop \n";
}
}
}
close(OUT);
I get a blank output file. There is no error or warning that I am getting so I am not sure what is going on and what I should be correcting?
My output file should be if overlapping regions found
gene name size start stop
xyz 5464 83654272 6735353
...
...
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.