perlbird has asked for the wisdom of the Perl Monks concerning the following question:

Almighty PerlMonks,

I have a question regarding casting and melting arrays. Is there a sweet alternative/solution to could cast or melt a array/table/dataframe like what R's reshape2 package got to ofter ? (see http://www.statmethods.net/management/reshape.html)

I have been working with tab-delimited text files that looks like this (white spaces below are tab):

Gene_name sample
gene_A sample_1
gene_B sample_1
gene_C sample_1
gene_B sample_2
gene_C sample_2
gene_A sample_3

and I want a output that looks something like this

gene_A sample_1 sample_2
gene_B sample_1 sample_2
gene_C sample_1 sample_3

is there a way i can do it with perl?

Cheers! perlbird

Replies are listed 'Best First'.
Re: Melt and casting array!
by Don Coyote (Hermit) on Apr 11, 2013 at 17:53 UTC

    hello perlbird and welcome to the monastery!

    I do not know what reshape2 has to offer, but your immediate request would be a fairly simple data manipulation exercise. Along with examples in previous reply.

    #!usr/bin/perl use warnings; use strict; my %genetable; while(<DATA>){ chomp; next if /^#/; my ($genekey, $samplevalue ) = split /\t/; push @{ $genetable{$genekey} }, $samplevalue; } print map { "$_ @{$genetable{$_}}\n" } sort keys %genetable; exit 0; __END__ #omitted the header row gene_A sample_1 gene_B sample_1 gene_C sample_1 gene_B sample_2 gene_C sample_2 gene_A sample_3
    -------- gene_A sample_1 sample_3 gene_B sample_1 sample_2 gene_C sample_1 sample_2
Re: Melt and casting array! (csv pivot)
by Anonymous Monk on Apr 11, 2013 at 13:07 UTC
Re: Melt and casting array!
by ww (Archbishop) on Apr 11, 2013 at 19:14 UTC
    Others may find your posting understandable. I don't. Please clarify:

    By what sort of decision chain/data testing do we get to an output for  gene a.... that includes   gene_A sample_2 when there is no  gene_A sample_2 in the original data? I infer that it is not because the original data has two  geneA... entities because the  geneC sample...< /c> data do <b>NOT>/b> include 3 distinct data points to justify the<c> gene_C sample_3 output.

    Minor update, mostly to offset data citations more clearly


    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: Melt and casting array!
by LanX (Saint) on Apr 11, 2013 at 13:06 UTC
    ( plz note: this post originally replied to a now reaped duplicate and was later reparented by the gods)

    please reformat your posting:

    • <p> between paragraphs
    • <c> ... </c> around code and data
    • [ ... ] around links
    EDIT: and please don't repost questions!!!
    use strict; use warnings; my %result; # Hash of Arrays my $headline = <DATA>; # ignore while (my $line =<DATA>) { my ($gene,$sample) = split /\s+/, $line; push @{$result{$gene}},$sample; } for my $gene (sort keys %result) { print join "\t", $gene, @{$result{$gene}}; print "\n"; } __DATA__ Gene_name sample gene_A sample_1 gene_B sample_1 gene_C sample_1 gene_B sample_2 gene_C sample_2 gene_A sample_3

    -->

    gene_A sample_1 sample_3 gene_B sample_1 sample_2 gene_C sample_1 sample_2

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      wow. that's really neat and handy!

      Thanks Rolf!!