table into complex data structure

ic23oluk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I am thinking of how to store the table below into a complex data structure, and which data structure to use. The input is a tab-separated .txt file, derived from excel. Note that some cells are empty (in this case "RQ Max"). Here's the table:

 
Well    Sample Name    Target Name    RQ Max    Ct Mean
1    Sample 1    actin        20,514
2    Sample 1    claudin        30,544
3    Sample 1    occludin        31,183
25    Sample 1    actin        20,514
26    Sample 1    claudin        30,544
27    Sample 1    occludin        31,183
49    Sample 2    actin        20,416
50    Sample 2    claudin        25,611
51    Sample 2    occludin        27,831
73    Sample 2    actin        20,416
74    Sample 2    claudin        25,611
75    Sample 2    occludin        27,831
97    Sample 3    actin        24,213
98    Sample 3    claudin        32,065
99    Sample 3    occludin        34,556
121    Sample 3    actin        24,213
122    Sample 3    claudin        32,065
123    Sample 3    occludin        34,556
145    Sample 4    actin        20,498
146    Sample 4    claudin        25,365
147    Sample 4    occludin        27,869
169    Sample 4    actin        20,498
170    Sample 4    claudin        25,365
171    Sample 4    occludin        27,869
193    H2O    actin        
194    H2O    claudin        
195    H2O    occludin        
217    H2O    actin        
218    H2O    claudin        
219    H2O    occludin
[download]

and here is my code

#! usr/bin/perl
use strict;
use warnings;


# CHECK FOR CORRECT USAGE
unless (@ARGV == 1){
    die "Usage: perl $0 \"file.txt\"\n";
}

my $input = "$ARGV[0]";
#chomp ($input);

open (READ, $input) || die "Cannot open $input: $!\n";

my $line = '';
my %data;
while ($line = <READ>){
    chomp $line;
    if ($line =~ m/^[0-9]/i);
        $i++;
        $data{"$i"} = [ split /\t{1}/, $line ];
    }
}
[download]

as you can see, i am at the very beginning of my program, because I am not sure which structure to use. Actually I only need three columns of the entire table, which is "Sample Name", "Target Name" and "Ct Mean". As I later want to calculate sth for each Sample, it might be helpful to have these as the keys. In a hash of hashes structure, I'd like to have the Target Names as the "second keys". Could somebody push me into the right direction? Im currently struggling with the storing of the data, as I haven't used perl for a longer period...

Thanks in advance!

Comment on table into complex data structure Select or Download Code

Replies are listed 'Best First'.
Re: table into complex data structure by choroba (Cardinal) on Oct 31, 2017 at 13:32 UTC
The data structure type depends pretty much on what you want to do with the data. Without details, we can't help you much: there are thousand ways to store the data, but only some of them are beneficial if you want to keep the original order, aggregate by subvalues in given columns, etc. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re^2: table into complex data structure by ic23oluk (Sexton) on Oct 31, 2017 at 13:54 UTC
`%data = ( Sample 1 => { actin => 20.514, claudin => 30.544, occludin => 31.183, }, Sample 2 => { actin => 20.416, claudin => 25.611, occludin => 27.831, }, ... );` [download] this is what i'd like to have at the end :)	[reply] [d/l]
Re^3: table into complex data structure by holli (Abbot) on Oct 31, 2017 at 16:01 UTC
use strict; use warnings; use List::Util qw(reduce); use Data::Dumper; my $hash = reduce { ${ $a->{ $b->[1] } }->{ $b->[2] } = $b->[3]; $a; } {}, grep { @$_ > 3 } map { [ split /\s{2,}/ ] } map { chomp; $_ } <DATA> ; print Dumper( $hash ); __DATA__ 1 Sample 1 actin 20,514 2 Sample 1 claudin 30,544 3 Sample 1 occludin 31,183 25 Sample 1 actin 20,514 26 Sample 1 claudin 30,544 27 Sample 1 occludin 31,183 49 Sample 2 actin 20,416 50 Sample 2 claudin 25,611 51 Sample 2 occludin 27,831 73 Sample 2 actin 20,416 74 Sample 2 claudin 25,611 75 Sample 2 occludin 27,831 97 Sample 3 actin 24,213 98 Sample 3 claudin 32,065 99 Sample 3 occludin 34,556 121 Sample 3 actin 24,213 122 Sample 3 claudin 32,065 123 Sample 3 occludin 34,556 145 Sample 4 actin 20,498 146 Sample 4 claudin 25,365 147 Sample 4 occludin 27,869 169 Sample 4 actin 20,498 170 Sample 4 claudin 25,365 171 Sample 4 occludin 27,869 193 H2O actin 194 H2O claudin 195 H2O occludin 217 H2O actin 218 H2O claudin 219 H2O occludin [download] holli You can lead your users to water, but alas, you cannot drown them.	[reply] [d/l]
Re: table into complex data structure by thanos1983 (Parson) on Oct 31, 2017 at 14:50 UTC
Hello ic23oluk, I think you are looking for something like that (sample of code below): #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # CHECK FOR CORRECT USAGE unless (@ARGV == 1){ die "Usage: perl $0 \"file.txt\"\n"; } my %hash; while (<>) { chomp; next if /^\s*$/; # skip empty lines my @columns = split (/\t/, $_); next if $columns[0] =~ m/[^0-9.]/; # skipe lines that do no start + with a number $hash{$columns[1]}{$columns[2]} = $columns[4]; } print Dumper \%hash; __END__ $ perl test.pl file.txt $VAR1 = { 'Sample 1' => { 'actin' => '20,514', 'claudin' => '30,544', 'occludin' => '31,183' }, 'H2O' => { 'actin' => undef, 'occludin' => undef, 'claudin' => undef }, 'Sample 4' => { 'actin' => '20,498', 'occludin' => '27,869', 'claudin' => '25,365' }, 'Sample 3' => { 'occludin' => '34,556', 'claudin' => '32,065', 'actin' => '24,213' }, 'Sample 2' => { 'claudin' => '25,611', 'occludin' => '27,831', 'actin' => '20,416' } }; [download] I am wondering though if you want to right on on top of common keys or you want a more complex data structure like hashes of hashes, read more about it here HASHES OF HASHES. Hope this helps, BR. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re: table into complex data structure by 1nickt (Canon) on Oct 31, 2017 at 14:11 UTC
Hi, for working with delimited data, don't try to parse it yourself, use Text::CSV. See this example posted today by haukex showing how to read a delimited file. Nothing you've described so far shows any need for any other "data storage" than the TSV file you have. You can generate any reporting you want from the current data. The way forward always starts with a minimal test.	[reply]