in reply to adding the missing sequence numbers
So what you want to do is basically:
Remove the first two columns. If the second column of the current line isn't one more than the third column of the previous line, insert a row.
Accommodations need to be made for multiple sequences.
If the current line is the first line of the file or if the first column of the current line is different than the first column of the previous line, then the third column of the previous line is considered to be zero.
So what do we insert? It's pretty clear what you want for the first three columns, but not for the rest. Seems to always be 2,2.
use strict; use warnings; my $last_seq_idx = 0; my $last_ele_idx = 0; while (<>) { chomp; my @rec = split(/,/); splice(@rec, 0, 2); # New sequence? if ($rec[0] != $last_seq_idx) { $last_seq_idx = $rec[0]; $last_ele_idx = 0; } # Is there a break in the sequence? if ($rec[1] != $last_ele_idx + 1) { my @new_rec = ($rec[0], $last_ele_idx+1, $rec[1]-1, 2, 2); print(join(',', @new_rec), "\n"); } print(join(',', @rec), "\n"); $last_ele_idx = $rec[2]; }
1,1,22,2,2 1,23,45,2,2 1,46,55,2,2 1,56,78,1,1 1,79,87,2,2 1,88,101,2,2 2,1,12,2,2 2,13,34,4,3 2,35,44,2,2 2,45,56,1,3
Update: Adjusted spec and code to remove first two columns of the input.
|
|---|