I need to extract information of the rows which have contigous values in first column. Data has lines " THIS IS A BREAK" where I need to split it. as well as when the difference between number is more than one.(for example between 10000066 and 10000071). There should be atleast two or more contigous rows.
Input tab delimited file:
10000058 3DKG000004283 290.48 10000059 3DKG000004315 290.48 10000060 3DKG000004421 1693.9 10000061 3DKG000004543 3118.77 THIS IS A BREAK 10000062 3DiKG000004569 2372.94 10000063 3DiKG000004681 528.87 10000064 3DiKG000004741 187.54 10000065 3DiKG000004773 327.84 10000066 3DiKG000004879 1301.43 10000071 3DiKG000005165 17.94 10000072 3DiKG000005193 13.45 10000074 3DiKG000005261 14.33 10000076 3DiKG000005331 144 THIS IS A BREAK 10000145 3DKG000007633 10.43 10000146 3DKG000007663 10.43 10000147 3DKG000007693 1224.8 10000148 3DKG000007727 1224.8 10000149 3DKG000007769 1359.73 10000162 3DKG000008189 307.62 10000163 3DKG000008231 307.62 10000164 3DKG000008261 14.69
OutPut should be start, end and count:
3DKG000004283 3DKG000004543 4 3DiKG000004569 3DiKG000004879 5 3DiKG000005165 3DiKG000005193 2 3DKG000007633 3DKG000007769 5 3DKG000008189 3DKG000008261 3
My code below is not giving me what i wanted:
#!/usr/bin/perl -w open(IN, "input.txt") || die "Can't open output1.txt: $!"; open(FILE, ">output.txt") || die "couldn't create the file\n"; while (<IN>) { $lines = $_; ($n,$p) = $lines =~ /^(\d+)\t(.+)\n/; push(@num, $n); push(@data, $p); } close(IN); $nlit = scalar(@num); for($c=0;$c<=$nlit;$c++) { $first = $num[$c]; $second = $num[$c+1]; $diff= $second-$first; if ($diff <= 1) { push(@B, $data[$c]); push(@N, $num[$c]); } elsif ($diff > 1) { if (scalar(@B) >=2) { $si = scalar(@B); @firstty = split /\t/, $B[0]; @lastty = split /\t/, $B[$#B]; print FILE "$firstty[0]\t$lastty[0]\t$si\n"; } undef @B; undef @N; } } close (IN);
Iam a newbie, Please correct my script. Thanks

In reply to check for contiguous row values by gudluck

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.