Debortoli has asked for the wisdom of the Perl Monks concerning the following question:

So, I have this files in a directory:

HG00119.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b +am HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b +am HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1 +.bam NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.b +am NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20101123.bam_herc2_phase1 +.bam NA20828.mapped.ILLUMINA.bwa.TSI.exome.20121211.bam_herc2_data.bam
I have a file.txt with only the sample names like this:
HG00119 HG00117 NA20828
I have this file.txt in an @samples to match with the files in the directory I told you above. After the match to see if the sample is inside the directory, I have linked this files to an @files. These .bam files will serve as input to a source command, more precisely, the command below that I already wrote:
my $command = "java -jar $picard/MergeSamFiles.jar INPUT= $files_bam O +UTPUT=$sample_name" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRU +E CREATE_INDEX=TRUE"; system($command); my $command = "java -jar $picard/MarkDuplicates.jar INPUT=$sample_name +-tmp-herc2.bam OUTPUT=$sample_name-herc2.bam METRICS_FILE=tmp REMOVE_ +DUPLICATES=TRUE CREATE_INDEX=TRUE"; system($command); unlink "$sample_name-tmp-herc2.bam"; unlink "$sample_name-tmp-herc2.bai"; unlink "tmp";
I want to create a Hash of Array and group the values (the @files) for the corresponding key (which are the @sample): For example like this...
%files = ( 'HG00119' => ['HG00119.mapped.illumina.mosaik.GBR.exome.20110411.b +am_herc2_phase1.bam', 'HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage. +20120522.bam_herc2_data.bam'], 'HG00117' => ['HG00117.mapped.illumina.mosaik.GBR.exome.20110411.b +am_herc2_phase1.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.2 +0120522.bam_herc2_data.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_cov +erage.20101123.bam_herc2_phase1.bam'], );
UPDATED: So, now I have this:
my %hash; $hash{$_} = [ glob("$_*.bam") ] for @lines;
Which will do the trick for the code above (the %files)...Now I have: sample => .bam files

How i can create the foreach for the source example below?

my $command = "java -jar $picard/MergeSamFiles.jar INPUT=HG00119.mappe +d.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam INPUT=HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_ +data.bam OUTPUT=HG00119" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIE +S=TRUE CREATE_INDEX=TRUE"; system($command); my $command = "java -jar $picard/MarkDuplicates.jar INPUT=HG00119-tmp- +herc2.bam OUTPUT=HG00119-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES +=TRUE CREATE_INDEX=TRUE"; system($command); unlink "HG00119-tmp-herc2.bam"; unlink "HG00119-tmp-herc2.bai"; unlink "tmp";
As you can see... each .bam file in the "INPUT=" is separated by a space.

I'm got stucked here :(

Do you guys think I should maybe try a subroutine, or something similar?

Thanks a lot!!!!

Replies are listed 'Best First'.
Re: Hash of Arrays for a source command
by trippledubs (Deacon) on Mar 09, 2015 at 17:40 UTC
    glob can grab a list of files that begin with a certain string
    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; # An array of files my @filesTxt = ('abc1','abc2'); my %hash; $hash{$_} = [ glob("$_*") ] for @filesTxt; dd %hash;
    ( "abc2", ["abc2", "abc20" .. "abc29"], "abc1", ["abc1", "abc10" .. "abc19"], )
      Worked like a charm!! Many thanks!