in reply to Hash_of_Hash_Would do it?

The following modification of the code I posted in reply to your Complex Data Structure thread should probably do the trick:

use strict; use warnings; my $data = <<DATA; CLS_S3_Contig100_st,CLS_S3_Contig100,13,10,0.3717 CLS_S3_Contig100_st,CLS_S3_Contig100,53,10,0.3717 CLS_S3_Contig10031_st,CLS_S3_Contig10031,53,12,0.5548 CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789 DATA my %origins; my $numColumns; open my $inFile, '<', \$data; while (<$inFile>) { chomp; next unless length; my @columns = split ','; $numColumns ||= @columns; # Assume first row has correct column co +unt $origins{$columns[1]}[$columns[2] - 1] = \@columns; } close $inFile; for my $oKey (sort keys %origins) { my $origin = $origins{$oKey}; my $lastAdded = -1; # Fake up present flag for +/- 8 lines around real data for my $pip (0 .. $#$origin) { next if $pip <= $lastAdded or ! defined $origin->[$pip]; my $first = $pip > 8 ? $pip - 8 : 0; my $last = $pip + 8; $last = $#$origin if $last > $#$origin; for my $index ($first .. $last) { next if defined $origin->[$index]; @{$origin->[$index]}[0 .. $numColumns - 1] = ('', $oKey, $index + 1, ('') x $numColumns); } $lastAdded = $last; } # Now output the result for my $pip (0 .. $#$origin) { my $row = $origin->[$pip]; if (defined $row) { # pip exists in original file print join (",", @$row, '1'), "\n"; } else { # pip doesn't exist in original file print ",$oKey,", $pip + 1, ',' x ($numColumns - 2), "0\n"; } } }

Prints:

... ,CLS_S3_Contig100,4,,,0 ,CLS_S3_Contig100,5,,,1 ,CLS_S3_Contig100,6,,,1 ,CLS_S3_Contig100,7,,,1 ,CLS_S3_Contig100,8,,,1 ,CLS_S3_Contig100,9,,,1 ,CLS_S3_Contig100,10,,,1 ,CLS_S3_Contig100,11,,,1 ,CLS_S3_Contig100,12,,,1 CLS_S3_Contig100_st,CLS_S3_Contig100,13,10,0.3717,1 ,CLS_S3_Contig100,14,,,1 ,CLS_S3_Contig100,15,,,1 ,CLS_S3_Contig100,16,,,1 ,CLS_S3_Contig100,17,,,1 ,CLS_S3_Contig100,18,,,1 ,CLS_S3_Contig100,19,,,1 ,CLS_S3_Contig100,20,,,1 ,CLS_S3_Contig100,21,,,1 ,CLS_S3_Contig100,22,,,0 ,CLS_S3_Contig100,23,,,0 ... ,CLS_S3_Contig100,43,,,0 ,CLS_S3_Contig100,44,,,0 ,CLS_S3_Contig100,45,,,1 ,CLS_S3_Contig100,46,,,1 ,CLS_S3_Contig100,47,,,1 ,CLS_S3_Contig100,48,,,1 ,CLS_S3_Contig100,49,,,1 ,CLS_S3_Contig100,50,,,1 ,CLS_S3_Contig100,51,,,1 ,CLS_S3_Contig100,52,,,1 CLS_S3_Contig100_st,CLS_S3_Contig100,53,10,0.3717,1 ,CLS_S3_Contig10031,1,,,0 ... ,CLS_S3_Contig10031,44,,,0 ,CLS_S3_Contig10031,45,,,1 ,CLS_S3_Contig10031,46,,,1 ,CLS_S3_Contig10031,47,,,1 ,CLS_S3_Contig10031,48,,,1 ,CLS_S3_Contig10031,49,,,1 ,CLS_S3_Contig10031,50,,,1 ,CLS_S3_Contig10031,51,,,1 ,CLS_S3_Contig10031,52,,,1 CLS_S3_Contig10031_st,CLS_S3_Contig10031,53,12,0.5548,1 ,CLS_S3_Contig10031,54,,,1 ,CLS_S3_Contig10031,55,,,1 ,CLS_S3_Contig10031,56,,,1 CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871,1 ,CLS_S3_Contig10031,58,,,1 ,CLS_S3_Contig10031,59,,,1 ,CLS_S3_Contig10031,60,,,1 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547,1 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129,1 ,CLS_S3_Contig10031,63,,,1 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789,1

Perl reduces RSI - it saves typing

Replies are listed 'Best First'.
Re^2: Hash_of_Hash_Would do it?
by Anonymous Monk on Sep 16, 2008 at 00:26 UTC
    Thank your very much GrandFather,

    The code works perfect, but when the gap between two adjacent PIP in the original file is large let's say 16 or more it behaves differently.

    Please take a look at the input and output files I have included here. In the output after 949 it has printed three "1"s instead of 8.

    I went through the code and I think a condition should be added here. To see if the gap in pip is greater or equal to 16. is my speculation right?

    my $first = $pip > 8 ? $pip - 8 : 0; my $last = $pip + 8;
    Original CLS_S3_Contig1000 933 10 0.488 CLS_S3_Contig1000 935 9 0.4867 CLS_S3_Contig1000 937 9 0.4989 CLS_S3_Contig1000 939 10 0.503 CLS_S3_Contig1000 943 8 0.3531 CLS_S3_Contig1000 949 8 0.3477 CLS_S3_Contig1000 1060 6 0.1498 CLS_S3_Contig1000 1070 9 0.3885 CLS_S3_Contig1000 1071 8 0.3848 OUTPUT CLS_S3_Contig1000 933 10 0.488 1 CLS_S3_Contig1000 934 1 CLS_S3_Contig1000 935 9 0.4867 1 CLS_S3_Contig1000 936 1 CLS_S3_Contig1000 937 9 0.4989 1 CLS_S3_Contig1000 938 1 CLS_S3_Contig1000 939 10 0.503 1 CLS_S3_Contig1000 940 1 CLS_S3_Contig1000 941 1 CLS_S3_Contig1000 942 1 CLS_S3_Contig1000 943 8 0.3531 1 CLS_S3_Contig1000 944 1 CLS_S3_Contig1000 945 1 CLS_S3_Contig1000 946 1 CLS_S3_Contig1000 947 1 CLS_S3_Contig1000 948 1 CLS_S3_Contig1000 949 8 0.3477 1 CLS_S3_Contig1000 950 1 CLS_S3_Contig1000 951 1 CLS_S3_Contig1000 952 0 CLS_S3_Contig1000 953 0 CLS_S3_Contig1000 954 0 CLS_S3_Contig1000 955 0 CLS_S3_Contig1000 956 0 CLS_S3_Contig1000 957 0

      It needs a little more work than that. Replace the fake up loop and preceding line with:

      my $lastToAdd = -1; # Fake up present flag for +/- 8 lines around real data for my $pip (0 .. $#$origin) { next if $pip > $lastToAdd and ! defined $origin->[$pip]; if ($pip <= $lastToAdd and ! defined $origin->[$pip]) { @{$origin->[$pip]}[0 .. $numColumns - 1] = ('', $oKey, $pip + 1, ('') x $numColumns); next; } my $first = $pip > 8 ? $pip - 8 : 0; for my $index ($first .. $pip) { next if defined $origin->[$index]; @{$origin->[$index]}[0 .. $numColumns - 1] = ('', $oKey, $index + 1, ('') x $numColumns); } $lastToAdd = $pip + 8; }

      The problem of course is that the finish point needs to be reset each time a new real line is found.


      Perl reduces RSI - it saves typing