in reply to Re^2: Complex Data Structure
in thread Complex Data Structure
If there are only two "interesting" columns, ignore the rest. Consider:
use strict; use warnings; my $data = <<DATA; CLS_S3_Contig100_st,CLS_S3_Contig100,53,10,0.3717 CLS_S3_Contig100_at,CLS_S3_Contig100,55,11,0.4321 CLS_S3_Contig100_st,CLS_S3_Contig100,57,10,0.3223 CLS_S3_Contig100_at,CLS_S3_Contig100,59,11,0.4055 CLS_S3_Contig100_st,CLS_S3_Contig100,61,11,0.4511 CLS_S3_Contig100_at,CLS_S3_Contig100,63,11,0.474 CLS_S3_Contig10031_st,CLS_S3_Contig10031,53,12,0.5548 CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789 DATA my %origins; my $numColumns; open my $inFile, '<', \$data; while (<$inFile>) { chomp; next unless length; my @columns = split ','; $numColumns ||= @columns; # Assume first row has correct column co +unt $origins{$columns[1]}[$columns[2] - 1] = \@columns; } close $inFile; for my $oKey (sort keys %origins) { my $origin = $origins{$oKey}; for my $pip (0 .. $#$origin) { my $row = $origin->[$pip]; if (defined $row) { # pip exists in original file print join (",", @$row, '1'), "\n"; } else { # pip doesn't exist in original file print ",$oKey,", $pip + 1, ',' x ($numColumns - 2), "0\n"; } } }
Prints (with large middle portion skipped):
,CLS_S3_Contig100,1,,,0 ,CLS_S3_Contig100,2,,,0 ,CLS_S3_Contig100,3,,,0 ,CLS_S3_Contig100,4,,,0 ,CLS_S3_Contig100,5,,,0 ... CLS_S3_Contig10031_st,CLS_S3_Contig10031,57,10,0.4871,1 ,CLS_S3_Contig10031,58,,,0 ,CLS_S3_Contig10031,59,,,0 ,CLS_S3_Contig10031,60,,,0 CLS_S3_Contig10031_st,CLS_S3_Contig10031,61,12,0.547,1 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,62,11,0.5129,1 ,CLS_S3_Contig10031,63,,,0 CLSS3627.b1_F19.ab1,CLS_S3_Contig10031,64,11,0.5789,1
which demonstrates what I understand you to want.
The key points are using a HoAoA where the hash is keyed by the origin and the array is indexed by PIP (- 1). Note that Perl generates the missing array elements, but sets them to undef so you can test for defined to see if you encountered the PIP in the original file.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Complex Data Structure
by sesemin (Beadle) on Sep 15, 2008 at 05:10 UTC |