Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a problem in getting a logic for a code I am working on, All I have to do is insert a line for each missing sequential numbers.
I have a 8 columns in a file of which the last 5 is of my interest. I have spliced the 5 columns and made it to a hash.
Now the spliced array is like this :
1234522 1567811 18810122 2133443 2455613 this would continue further toaround1000's of lines
Which actually looks like this if comma separated
,
1,23,45,2,2 1,56,78,1,1 1,88,101,2,2 2,13,34,4,3 2,45,56,1,3
I have the first column as a key and rest in a array as values.
What I would need is something like this
1,1,22,2,2 1,23,45,2,2 1,46,55,2,2 1,56,78,1,1 1,79,87,2,2 1,88,101,2,2 2,1,12,2,2 2,13,34,4,3 2,35,44,2,2 2,45,56,1,3
I would like to insert rows, so that there is no gaps in the sequence, and is continuous until it finishes the particular key>say 1/2.

Any suggestions or ideas? Though it doesn't do much; my little code is here
!/software/bin/perl use strict; use warnings ; while(<>){ my @line = split(/,/,$_); my @splice = splice(@line, 3,8); my $chr = shift @splice;#shifts the first element my %hash; $hash{$chr} = \@splice; foreach my $key (sort keys %hash){ print $key, "," , $hash{$key}[0] ,",", $hash{$key}[1], "\n"; } }
The input would be: 1,34,1,23,45,2,2 35,45,1,56,78,1,1 46,56,1,88,101,2,2 57,68,2,13,34,4,3 69,78,2,45,56,1,3
Thanks a lot in advance,

Replies are listed 'Best First'.
Re: adding the missing sequence numbers
by ikegami (Patriarch) on Jan 19, 2011 at 17:49 UTC

    So what you want to do is basically:

    Remove the first two columns. If the second column of the current line isn't one more than the third column of the previous line, insert a row.

    Accommodations need to be made for multiple sequences.

    If the current line is the first line of the file or if the first column of the current line is different than the first column of the previous line, then the third column of the previous line is considered to be zero.

    So what do we insert? It's pretty clear what you want for the first three columns, but not for the rest. Seems to always be 2,2.

    use strict; use warnings; my $last_seq_idx = 0; my $last_ele_idx = 0; while (<>) { chomp; my @rec = split(/,/); splice(@rec, 0, 2); # New sequence? if ($rec[0] != $last_seq_idx) { $last_seq_idx = $rec[0]; $last_ele_idx = 0; } # Is there a break in the sequence? if ($rec[1] != $last_ele_idx + 1) { my @new_rec = ($rec[0], $last_ele_idx+1, $rec[1]-1, 2, 2); print(join(',', @new_rec), "\n"); } print(join(',', @rec), "\n"); $last_ele_idx = $rec[2]; }
    1,1,22,2,2 1,23,45,2,2 1,46,55,2,2 1,56,78,1,1 1,79,87,2,2 1,88,101,2,2 2,1,12,2,2 2,13,34,4,3 2,35,44,2,2 2,45,56,1,3

    Update: Adjusted spec and code to remove first two columns of the input.

Re: adding the missing sequence numbers
by jethro (Monsignor) on Jan 19, 2011 at 18:08 UTC

    You never say what values you want in columns 7 and 8 of your lines. Always 2,2 ?

    The example input you give at the end has only 7 columns, not 8

    Also you seem to have duplicate keys in your hash, i.e. the key '1' happens to be 3 times in your input. If you store that in a hash, only the last line with a specific key will survive. So either you have to append additional data to the arrays in your hash or use a ArrayofArrays structure to store the data. Or, if the data is already sorted, don't store anything (Ikegamis solution works with that assumption)

      Also you seem to have duplicate keys in your hash

      His hash never has more than one element, so I find that hard to believe.

        Let me rephrase that: Also you seem to try to store duplicate keys in your hash ;-)

        Naturally the 'my %hash' inside the loop prevents any data to accumulate in the hash (this as an explanation to the poster who started the thread), but I'm sure he had something else in mind for that hash. Not to say that I did spot that bug, I didn't