I thought that I would post a possible solution (using the module Set::IntSpan). It eliminates making the logic to detect overlap by testing for set intersection between the spans in the cnv file and the spans in the genelist* files. For each line in the genelist file, it checks to see if it intersects with any/all lines in the cnv file.

However, this may not be what you were looking for as far as output/results. :-(

#!/usr/bin/perl use strict; use warnings; use Set::IntSpan; my $cnv = <<EOF; start stop size(in bp) 5389769 98256008 3567 7452344 871875466 64547 EOF my $genelist1 = <<EOF; name start stop BRCA1 41196312 41277500 TP53 7571720 7590863 EOF my $genelist3 = <<EOF; name start stop OMG 29621668 29624380 NR3C1 142657496..142815077 EOF my @cnv; open my $fh, "<", \$cnv; <$fh>; # toss header while (<$fh>) { chomp; my ($start, $stop, $size) = split /\t/; push @cnv, { span => Set::IntSpan->new("$start-$stop"), size => $size, }; } close $fh or die $!; printf "%-10s%10s%10s%10s\n", ('gene name', qw/ size start stop /); for (\$genelist1, \$genelist3) { open $fh, "<", $_; <$fh>; # toss header while (<$fh>) { chomp; my ($name, $start, $stop) = split /\t|(?<=\d)\.\.(?=\d)/; my $span = Set::IntSpan->new("$start-$stop"); for my $href (@cnv) { my $intersect = $span->intersect( $href->{span} ); if ($intersect) { printf "%-10s%10s%10s%10s\n", $name, $href->{size}, $intersect->min, $intersect- +>max; } } } close $fh or die $!; }
And produced this output
gene name size start stop BRCA1 3567 41196312 41277500 BRCA1 64547 41196312 41277500 TP53 3567 7571720 7590863 TP53 64547 7571720 7590863 OMG 3567 29621668 29624380 OMG 64547 29621668 29624380 NR3C1 64547 142657496 142815077

Hoping this is something like you were asking. The same genename overlapped with both spans in the cnv file for all of the genenames except for the last one, NR3C1. It overlapped with the span in the cnv file with size=64547 but not with the span with size=3567

Chris


In reply to Re: overlapping regions by Cristoforo
in thread overlapping regions by perllearner007

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.