in reply to Re^2: Complex Data Structure
in thread Complex Data Structure

If there are only two "interesting" columns, ignore the rest. Consider:

use strict; use warnings; my $data = <<DATA; CLS_S3_Contig100_st,CLS_S3_Contig100,53,10,0.3717 CLS_S3_Contig100_at,CLS_S3_Contig100,55,11,0.4321 CLS_S3_Contig100_st,CLS_S3_Contig100,57,10,0.3223 CLS_S3_Contig100_at,CLS_S3_Contig100,59,11,0.4055 CLS_S3_Contig100_st,CLS_S3_Contig100,61,11,0.4511 CLS_S3_Contig100_at,CLS_S3_Contig100,63,11,0.474 CLS_S3_Contig10031_st,CLS_S3_Contig10031,53,12,0.5548 CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789 DATA my %origins; my $numColumns; open my $inFile, '<', \$data; while (<$inFile>) { chomp; next unless length; my @columns = split ','; $numColumns ||= @columns; # Assume first row has correct column co +unt $origins{$columns[1]}[$columns[2] - 1] = \@columns; } close $inFile; for my $oKey (sort keys %origins) { my $origin = $origins{$oKey}; for my $pip (0 .. $#$origin) { my $row = $origin->[$pip]; if (defined $row) { # pip exists in original file print join (",", @$row, '1'), "\n"; } else { # pip doesn't exist in original file print ",$oKey,", $pip + 1, ',' x ($numColumns - 2), "0\n"; } } }

Prints (with large middle portion skipped):

,CLS_S3_Contig100,1,,,0 ,CLS_S3_Contig100,2,,,0 ,CLS_S3_Contig100,3,,,0 ,CLS_S3_Contig100,4,,,0 ,CLS_S3_Contig100,5,,,0 ... CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871,1 ,CLS_S3_Contig10031,58,,,0 ,CLS_S3_Contig10031,59,,,0 ,CLS_S3_Contig10031,60,,,0 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547,1 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129,1 ,CLS_S3_Contig10031,63,,,0 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789,1

which demonstrates what I understand you to want.

The key points are using a HoAoA where the hash is keyed by the origin and the array is indexed by PIP (- 1). Note that Perl generates the missing array elements, but sets them to undef so you can test for defined to see if you encountered the PIP in the original file.


Perl reduces RSI - it saves typing

Replies are listed 'Best First'.
Re^4: Complex Data Structure
by sesemin (Beadle) on Sep 15, 2008 at 05:10 UTC
    Dear Grand Father, You are genius and your code is perfect, it is filling up the gaps (taking care of even-odd numbers). However, in the range plus/minus 8, for example for PIP=53-337 in the case of Contig100 if I want to have all "1"s what can I do? Lets say 45 to 345 all take 1 and before 45 all 0s. Thanks again, Pedro
    . . . . CLS_S3_Contig100 40 0 CLS_S3_Contig100 41 0 CLS_S3_Contig100 42 0 CLS_S3_Contig100 43 0 CLS_S3_Contig100 44 0 CLS_S3_Contig100 45 0 CLS_S3_Contig100 46 0 CLS_S3_Contig100 47 0 CLS_S3_Contig100 48 0 CLS_S3_Contig100 49 0 CLS_S3_Contig100 50 0 CLS_S3_Contig100 51 0 CLS_S3_Contig100 52 0 CLS_S3_Contig100_st CLS_S3_Contig100 53 10 0.3717 + 1 CLS_S3_Contig100 54 0 CLS_S3_Contig100_at CLS_S3_Contig100 55 11 0.4321 + 1 CLS_S3_Contig100 56 0 CLS_S3_Contig100_st CLS_S3_Contig100 57 10 0.3223 + 1 CLS_S3_Contig100 58 0 CLS_S3_Contig100_at CLS_S3_Contig100 59 11 0.4055 + 1 CLS_S3_Contig100 60 0 CLS_S3_Contig100_st CLS_S3_Contig100 61 11 0.4511 + 1 CLS_S3_Contig100 62 0 CLS_S3_Contig100_at CLS_S3_Contig100 63 11 0.474 + 1 CLS_S3_Contig100 64 0 . . . .data flow...