cristianro87 has asked for the wisdom of the Perl Monks concerning the following question:

Hello gurus, i have a question. I have a vcf file like this

#CHROM POS ID REF ALT QUAL chr1 1092344 . G T 79.54 chr1 1092367 . C T 148.50 chr1 1092400 . G C 90.54 chr1 1092424 . A G 93.14 chr1 1092461 . G A 105.30 chr1 1092470 . T G 103.06 chr1 1092482 . T C 104.33 chr1 1093235 . G A 16.08 chr1 1093245 . C T 244.75

I wan't to acces the elements in this way

[chr1][1092482][REF] = T

it can be done?

Replies are listed 'Best First'.
Re: Load table with row/column names
by choroba (Cardinal) on May 30, 2014 at 10:48 UTC
    If you can tolerate the change of square brackets to curly braces, it's easy:
    #!/usr/bin/perl use warnings; use strict; <DATA>; # Skip the header line my %table; while (<DATA>) { my ($chrom, $pos, $id, $ref, $alt, $qual) = split; $table{$chrom}{$pos} = { REF => $ref, ALT => $alt, QUAL => $qual, }; } print $table{chr1}{1092482}{REF}, "\n"; # Prints T, yay! __DATA__ #CHROM POS ID REF ALT QUAL chr1 1092344 . G T 79.54 chr1 1092367 . C T 148.50 chr1 1092400 . G C 90.54 chr1 1092424 . A G 93.14 chr1 1092461 . G A 105.30 chr1 1092470 . T G 103.06 chr1 1092482 . T C 104.33 chr1 1093235 . G A 16.08 chr1 1093245 . C T 244.75
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Load table with row/column names
by johngg (Canon) on May 30, 2014 at 12:47 UTC

    As choroba has demonstrated, a HoHoH structure is probably what you are looking for. The Data::Dumper module is useful for confirming that the data structures your code produces are what you were expecting. In the following code I use the column headers to determine the keys of the inner hash and I use a hash slice to populate it. I have added a chr2 and a chr3 to the data to demonstrate the outer hash more clearly.

    $ perl -Mstrict -Mwarnings -MData::Dumper -E ' > open my $tableFH, q{<}, \ <<EOD or die $!; > #CHROM POS ID REF ALT QUAL > chr1 1092344 . G T 79.54 > chr1 1092367 . C T 148.50 > chr1 1092400 . G C 90.54 > chr1 1092424 . A G 93.14 > chr1 1092461 . G A 105.30 > chr1 1092470 . T G 103.06 > chr1 1092482 . T C 104.33 > chr1 1093235 . G A 16.08 > chr1 1093245 . C T 244.75 > chr2 1347864 . T C 107.34 > chr2 1456284 . A C 86.32 > chr3 2031473 . G T 25.34 > chr3 2256801 . C T 154.65 > EOD > > my( undef, undef, @cols ) = map { split } scalar <$tableFH>; > my %chroms; > > while ( <$tableFH> ) > { > my( $chr, $pos, @vals ) = split; > @{ $chroms{ $chr }->{ $pos } }{ @cols } = @vals; > } > > print Data::Dumper->Dumpxs( [ \ %chroms ], [ qw{ *chroms } ] );'

    The output.

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Load table with row/column names
by Laurent_R (Canon) on May 30, 2014 at 10:53 UTC
    Not exactly this way, but you could build a hash of hashes of hashes where you could access your data this way:
    my %hash; # ... populate the hash and then possibly something like this: my $value = $hash{chr1}{1092482}{REF};
    But your question is too vague, I can't really offer actual code not knowing what you really want to store in your data structure.

    Edit: Not only was choroba (++) faster than me to type an answer, but he did make some hypotheses on the data you need to offer a complete solution.

Re: Load table with row/column names
by ww (Archbishop) on May 30, 2014 at 13:27 UTC

    ALTERNATE INTERPRETATION OF OP's DESIRES; namely, that cristianro87 wants to summarize ALL the data elements:

    #!/usr/bin/perl use warnings; use strict; use 5.016; # 1087978 while (<DATA>) { my ($chrom, $pos, $id, $ref, $alt, $qual); if ($_ =~ /#CHROM.+/ ) { # title "Load table with row/co +lumn names" next; # does NOT match "access the el +ements" example } elsif ( $_ eq '' ) { last; } else { ($chrom, $pos, $id, $ref, $alt, $qual) = split; # desired out: [chr1][1092482][REF] = T say "[$chrom] [$pos] [REF] = " . $alt; } } =head execution C:\>1087978.pl [chr1] [1092344] [REF] = T [chr1] [1092367] [REF] = T [chr1] [1092400] [REF] = C [chr1] [1092424] [REF] = G [chr1] [1092461] [REF] = A [chr1] [1092470] [REF] = G [chr1] [1092482] [REF] = C [chr1] [1093235] [REF] = A [chr1] [1093245] [REF] = T =cut __DATA__ #CHROM POS ID REF ALT QUAL chr1 1092344 . G T 79.54 chr1 1092367 . C T 148.50 chr1 1092400 . G C 90.54 chr1 1092424 . A G 93.14 chr1 1092461 . G A 105.30 chr1 1092470 . T G 103.06 chr1 1092482 . T C 104.33 chr1 1093235 . G A 16.08 chr1 1093245 . C T 244.75

    Far simpler: no hashes but possibly (caveat here ->) a simple-minded interp of an unclear SOPW.


    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.