comment on

If there are only two "interesting" columns, ignore the rest. Consider:

use strict;
use warnings;

my $data = <<DATA;
CLS_S3_Contig100_st,CLS_S3_Contig100,53,10,0.3717
CLS_S3_Contig100_at,CLS_S3_Contig100,55,11,0.4321
CLS_S3_Contig100_st,CLS_S3_Contig100,57,10,0.3223
CLS_S3_Contig100_at,CLS_S3_Contig100,59,11,0.4055
CLS_S3_Contig100_st,CLS_S3_Contig100,61,11,0.4511
CLS_S3_Contig100_at,CLS_S3_Contig100,63,11,0.474
CLS_S3_Contig10031_st,CLS_S3_Contig10031,53,12,0.5548
CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871
CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547
CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129
CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789
DATA

my %origins;
my $numColumns;

open my $inFile, '<', \$data;
while (<$inFile>) {
    chomp;
    next unless length;
    
    my @columns = split ',';
    
    $numColumns ||= @columns; # Assume first row has correct column co
+unt
    $origins{$columns[1]}[$columns[2] - 1] = \@columns;
}
close $inFile;

for my $oKey (sort keys %origins) {
    my $origin = $origins{$oKey};
    
    for my $pip (0 .. $#$origin) {
        my $row = $origin->[$pip];
        
        if (defined $row) {
            # pip exists in original file
            print join (",", @$row, '1'), "\n";
        } else {
            # pip doesn't exist in original file
            print ",$oKey,", $pip + 1, ',' x ($numColumns - 2), "0\n";
        }
    }
}
[download]

Prints (with large middle portion skipped):

,CLS_S3_Contig100,1,,,0
,CLS_S3_Contig100,2,,,0
,CLS_S3_Contig100,3,,,0
,CLS_S3_Contig100,4,,,0
,CLS_S3_Contig100,5,,,0
...
CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871,1
,CLS_S3_Contig10031,58,,,0
,CLS_S3_Contig10031,59,,,0
,CLS_S3_Contig10031,60,,,0
CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547,1
CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129,1
,CLS_S3_Contig10031,63,,,0
CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789,1
[download]

which demonstrates what I understand you to want.

The key points are using a HoAoA where the hash is keyed by the origin and the array is indexed by PIP (- 1). Note that Perl generates the missing array elements, but sets them to undef so you can test for defined to see if you encountered the PIP in the original file.

Perl reduces RSI - it saves typing

In reply to Re^3: Complex Data Structure by GrandFather
in thread Complex Data Structure by sesemin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.