If I understand what you want correctly, this (not very elegant) script should work (provided your values are in ascending order of contig start site (column 3)):

#!/usr/bin/perl use strict; use warnings; my $dataFile = $ARGV[0]; my $get = 1; my @array = (); my $col1 = 0; my $col2 = 0; my $low = 0; my $high = 0; open (FILE, "<", $dataFile); while (<FILE>) { next unless $_ =~ /[0-9]/; @array = split(" ", $_); if ($get == 1) { if (($array[2] > $high) && ($high > 0)) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; } $col1 = $array[0]; $col2 = $array[1]; $low = $array[2] if (($array[2] < $low) || ($low == 0) +); $high = $array[3] if (($array[3] > $high) || ($high == + 0));; $get = 0; } if ($array[2] <= $high) { $high = $array[3] if $array[3] > $high; } if (($array[2] > $high) || (eof(FILE))) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; $get = 1; } }

If the chromosome changes during your large file, it would need to be modified to account for that.

There's probably a much prettier way of doing it though...

Michael

In reply to Re: Finding Overlapping Regions on Genome by mtmcc
in thread Finding Overlapping Regions on Genome by oxydeepu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.