Hash of Arrays for a source command

Debortoli has asked for the wisdom of the Perl Monks concerning the following question:

So, I have this files in a directory:

HG00119.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b
+am
HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b
+am
HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1
+.bam
NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.b
+am
NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20101123.bam_herc2_phase1
+.bam
NA20828.mapped.ILLUMINA.bwa.TSI.exome.20121211.bam_herc2_data.bam
[download]

I have a file.txt with only the sample names like this:

HG00119
HG00117
NA20828
[download]

I have this file.txt in an @samples to match with the files in the directory I told you above. After the match to see if the sample is inside the directory, I have linked this files to an @files. These .bam files will serve as input to a source command, more precisely, the command below that I already wrote:

my $command = "java -jar $picard/MergeSamFiles.jar INPUT= $files_bam O
+UTPUT=$sample_name" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRU
+E CREATE_INDEX=TRUE";
system($command);

my $command = "java -jar $picard/MarkDuplicates.jar INPUT=$sample_name
+-tmp-herc2.bam OUTPUT=$sample_name-herc2.bam METRICS_FILE=tmp REMOVE_
+DUPLICATES=TRUE CREATE_INDEX=TRUE";
system($command);

unlink "$sample_name-tmp-herc2.bam";
unlink "$sample_name-tmp-herc2.bai";
unlink "tmp";
[download]

I want to create a Hash of Array and group the values (the @files) for the corresponding key (which are the @sample): For example like this...

%files = (
    'HG00119' => ['HG00119.mapped.illumina.mosaik.GBR.exome.20110411.b
+am_herc2_phase1.bam',  'HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.
+20120522.bam_herc2_data.bam'],
    'HG00117' => ['HG00117.mapped.illumina.mosaik.GBR.exome.20110411.b
+am_herc2_phase1.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.2
+0120522.bam_herc2_data.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_cov
+erage.20101123.bam_herc2_phase1.bam'],
);
[download]

UPDATED: So, now I have this:

my %hash;

$hash{$_} = [ glob("$_*.bam") ] for @lines;
[download]

Which will do the trick for the code above (the %files)...Now I have: sample => .bam files

How i can create the foreach for the source example below?

my $command = "java -jar $picard/MergeSamFiles.jar INPUT=HG00119.mappe
+d.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam
INPUT=HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_
+data.bam OUTPUT=HG00119" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIE
+S=TRUE CREATE_INDEX=TRUE";
system($command);
my $command = "java -jar $picard/MarkDuplicates.jar INPUT=HG00119-tmp-
+herc2.bam OUTPUT=HG00119-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES
+=TRUE CREATE_INDEX=TRUE";
system($command);
    
unlink "HG00119-tmp-herc2.bam";
unlink "HG00119-tmp-herc2.bai";
unlink "tmp";
[download]

As you can see... each .bam file in the "INPUT=" is separated by a space.

I'm got stucked here :(

Do you guys think I should maybe try a subroutine, or something similar?

Thanks a lot!!!!

Comment on Hash of Arrays for a source command Select or Download Code

Replies are listed 'Best First'.
Re: Hash of Arrays for a source command by trippledubs (Deacon) on Mar 09, 2015 at 17:40 UTC
glob can grab a list of files that begin with a certain string `#!/usr/bin/env perl use strict; use warnings; use Data::Dump; # An array of files my @filesTxt = ('abc1','abc2'); my %hash; $hash{$_} = [ glob("$_*") ] for @filesTxt; dd %hash;` [download] `( "abc2", ["abc2", "abc20" .. "abc29"], "abc1", ["abc1", "abc10" .. "abc19"], )` [download]	[reply] [d/l] [select]
Re^2: Hash of Arrays for a source command by Debortoli (Initiate) on Mar 09, 2015 at 20:08 UTC
Worked like a charm!! Many thanks!	[reply]