So what you want to do is basically:

Remove the first two columns. If the second column of the current line isn't one more than the third column of the previous line, insert a row.

Accommodations need to be made for multiple sequences.

If the current line is the first line of the file or if the first column of the current line is different than the first column of the previous line, then the third column of the previous line is considered to be zero.

So what do we insert? It's pretty clear what you want for the first three columns, but not for the rest. Seems to always be 2,2.

use strict; use warnings; my $last_seq_idx = 0; my $last_ele_idx = 0; while (<>) { chomp; my @rec = split(/,/); splice(@rec, 0, 2); # New sequence? if ($rec[0] != $last_seq_idx) { $last_seq_idx = $rec[0]; $last_ele_idx = 0; } # Is there a break in the sequence? if ($rec[1] != $last_ele_idx + 1) { my @new_rec = ($rec[0], $last_ele_idx+1, $rec[1]-1, 2, 2); print(join(',', @new_rec), "\n"); } print(join(',', @rec), "\n"); $last_ele_idx = $rec[2]; }
1,1,22,2,2 1,23,45,2,2 1,46,55,2,2 1,56,78,1,1 1,79,87,2,2 1,88,101,2,2 2,1,12,2,2 2,13,34,4,3 2,35,44,2,2 2,45,56,1,3

Update: Adjusted spec and code to remove first two columns of the input.


In reply to Re: adding the missing sequence numbers by ikegami
in thread adding the missing sequence numbers by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.