ic23oluk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I am thinking of how to store the table below into a complex data structure, and which data structure to use. The input is a tab-separated .txt file, derived from excel. Note that some cells are empty (in this case "RQ Max"). Here's the table:

Well Sample Name Target Name RQ Max Ct Mean 1 Sample 1 actin 20,514 2 Sample 1 claudin 30,544 3 Sample 1 occludin 31,183 25 Sample 1 actin 20,514 26 Sample 1 claudin 30,544 27 Sample 1 occludin 31,183 49 Sample 2 actin 20,416 50 Sample 2 claudin 25,611 51 Sample 2 occludin 27,831 73 Sample 2 actin 20,416 74 Sample 2 claudin 25,611 75 Sample 2 occludin 27,831 97 Sample 3 actin 24,213 98 Sample 3 claudin 32,065 99 Sample 3 occludin 34,556 121 Sample 3 actin 24,213 122 Sample 3 claudin 32,065 123 Sample 3 occludin 34,556 145 Sample 4 actin 20,498 146 Sample 4 claudin 25,365 147 Sample 4 occludin 27,869 169 Sample 4 actin 20,498 170 Sample 4 claudin 25,365 171 Sample 4 occludin 27,869 193 H2O actin 194 H2O claudin 195 H2O occludin 217 H2O actin 218 H2O claudin 219 H2O occludin
and here is my code
#! usr/bin/perl use strict; use warnings; # CHECK FOR CORRECT USAGE unless (@ARGV == 1){ die "Usage: perl $0 \"file.txt\"\n"; } my $input = "$ARGV[0]"; #chomp ($input); open (READ, $input) || die "Cannot open $input: $!\n"; my $line = ''; my %data; while ($line = <READ>){ chomp $line; if ($line =~ m/^[0-9]/i); $i++; $data{"$i"} = [ split /\t{1}/, $line ]; } }

as you can see, i am at the very beginning of my program, because I am not sure which structure to use. Actually I only need three columns of the entire table, which is "Sample Name", "Target Name" and "Ct Mean". As I later want to calculate sth for each Sample, it might be helpful to have these as the keys. In a hash of hashes structure, I'd like to have the Target Names as the "second keys". Could somebody push me into the right direction? Im currently struggling with the storing of the data, as I haven't used perl for a longer period...

Thanks in advance!

Replies are listed 'Best First'.
Re: table into complex data structure
by choroba (Cardinal) on Oct 31, 2017 at 13:32 UTC
    The data structure type depends pretty much on what you want to do with the data. Without details, we can't help you much: there are thousand ways to store the data, but only some of them are beneficial if you want to keep the original order, aggregate by subvalues in given columns, etc.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      %data = ( Sample 1 => { actin => 20.514, claudin => 30.544, occludin => 31.183, }, Sample 2 => { actin => 20.416, claudin => 25.611, occludin => 27.831, }, ... );

      this is what i'd like to have at the end :)

        use strict; use warnings; use List::Util qw(reduce); use Data::Dumper; my $hash = reduce { ${ $a->{ $b->[1] } }->{ $b->[2] } = $b->[3]; $a; } {}, grep { @$_ > 3 } map { [ split /\s{2,}/ ] } map { chomp; $_ } <DATA> ; print Dumper( $hash ); __DATA__ 1 Sample 1 actin 20,514 2 Sample 1 claudin 30,544 3 Sample 1 occludin 31,183 25 Sample 1 actin 20,514 26 Sample 1 claudin 30,544 27 Sample 1 occludin 31,183 49 Sample 2 actin 20,416 50 Sample 2 claudin 25,611 51 Sample 2 occludin 27,831 73 Sample 2 actin 20,416 74 Sample 2 claudin 25,611 75 Sample 2 occludin 27,831 97 Sample 3 actin 24,213 98 Sample 3 claudin 32,065 99 Sample 3 occludin 34,556 121 Sample 3 actin 24,213 122 Sample 3 claudin 32,065 123 Sample 3 occludin 34,556 145 Sample 4 actin 20,498 146 Sample 4 claudin 25,365 147 Sample 4 occludin 27,869 169 Sample 4 actin 20,498 170 Sample 4 claudin 25,365 171 Sample 4 occludin 27,869 193 H2O actin 194 H2O claudin 195 H2O occludin 217 H2O actin 218 H2O claudin 219 H2O occludin


        holli

        You can lead your users to water, but alas, you cannot drown them.
Re: table into complex data structure
by thanos1983 (Parson) on Oct 31, 2017 at 14:50 UTC

    Hello ic23oluk,

    I think you are looking for something like that (sample of code below):

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # CHECK FOR CORRECT USAGE unless (@ARGV == 1){ die "Usage: perl $0 \"file.txt\"\n"; } my %hash; while (<>) { chomp; next if /^\s*$/; # skip empty lines my @columns = split (/\t/, $_); next if $columns[0] =~ m/[^0-9.]/; # skipe lines that do no start + with a number $hash{$columns[1]}{$columns[2]} = $columns[4]; } print Dumper \%hash; __END__ $ perl test.pl file.txt $VAR1 = { 'Sample 1' => { 'actin' => '20,514', 'claudin' => '30,544', 'occludin' => '31,183' }, 'H2O' => { 'actin' => undef, 'occludin' => undef, 'claudin' => undef }, 'Sample 4' => { 'actin' => '20,498', 'occludin' => '27,869', 'claudin' => '25,365' }, 'Sample 3' => { 'occludin' => '34,556', 'claudin' => '32,065', 'actin' => '24,213' }, 'Sample 2' => { 'claudin' => '25,611', 'occludin' => '27,831', 'actin' => '20,416' } };

    I am wondering though if you want to right on on top of common keys or you want a more complex data structure like hashes of hashes, read more about it here HASHES OF HASHES.

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: table into complex data structure
by 1nickt (Canon) on Oct 31, 2017 at 14:11 UTC

    Hi, for working with delimited data, don't try to parse it yourself, use Text::CSV. See this example posted today by haukex showing how to read a delimited file.

    Nothing you've described so far shows any need for any other "data storage" than the TSV file you have. You can generate any reporting you want from the current data.


    The way forward always starts with a minimal test.