So, I have this files in a directory:
I have a file.txt with only the sample names like this:HG00119.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b +am HG00117.mapped.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_data.b +am HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam_herc2_phase1 +.bam NA20828.mapped.illumina.mosaik.TSI.exome.20110411.bam_herc2_phase1.bam NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20130415.bam_herc2_data.b +am NA20828.mapped.ILLUMINA.bwa.TSI.low_coverage.20101123.bam_herc2_phase1 +.bam NA20828.mapped.ILLUMINA.bwa.TSI.exome.20121211.bam_herc2_data.bam
I have this file.txt in an @samples to match with the files in the directory I told you above. After the match to see if the sample is inside the directory, I have linked this files to an @files. These .bam files will serve as input to a source command, more precisely, the command below that I already wrote:HG00119 HG00117 NA20828
I want to create a Hash of Array and group the values (the @files) for the corresponding key (which are the @sample): For example like this...my $command = "java -jar $picard/MergeSamFiles.jar INPUT= $files_bam O +UTPUT=$sample_name" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIES=TRU +E CREATE_INDEX=TRUE"; system($command); my $command = "java -jar $picard/MarkDuplicates.jar INPUT=$sample_name +-tmp-herc2.bam OUTPUT=$sample_name-herc2.bam METRICS_FILE=tmp REMOVE_ +DUPLICATES=TRUE CREATE_INDEX=TRUE"; system($command); unlink "$sample_name-tmp-herc2.bam"; unlink "$sample_name-tmp-herc2.bai"; unlink "tmp";
UPDATED: So, now I have this:%files = ( 'HG00119' => ['HG00119.mapped.illumina.mosaik.GBR.exome.20110411.b +am_herc2_phase1.bam', 'HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage. +20120522.bam_herc2_data.bam'], 'HG00117' => ['HG00117.mapped.illumina.mosaik.GBR.exome.20110411.b +am_herc2_phase1.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_coverage.2 +0120522.bam_herc2_data.bam', 'HG00117.mapped.ILLUMINA.bwa.GBR.low_cov +erage.20101123.bam_herc2_phase1.bam'], );
Which will do the trick for the code above (the %files)...Now I have: sample => .bam filesmy %hash; $hash{$_} = [ glob("$_*.bam") ] for @lines;
How i can create the foreach for the source example below?
As you can see... each .bam file in the "INPUT=" is separated by a space.my $command = "java -jar $picard/MergeSamFiles.jar INPUT=HG00119.mappe +d.illumina.mosaik.GBR.exome.20110411.bam_herc2_phase1.bam INPUT=HG00119.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam_herc2_ +data.bam OUTPUT=HG00119" . "-tmp-herc2.bam MERGE_SEQUENCE_DICTIONARIE +S=TRUE CREATE_INDEX=TRUE"; system($command); my $command = "java -jar $picard/MarkDuplicates.jar INPUT=HG00119-tmp- +herc2.bam OUTPUT=HG00119-herc2.bam METRICS_FILE=tmp REMOVE_DUPLICATES +=TRUE CREATE_INDEX=TRUE"; system($command); unlink "HG00119-tmp-herc2.bam"; unlink "HG00119-tmp-herc2.bai"; unlink "tmp";
I'm got stucked here :(
Do you guys think I should maybe try a subroutine, or something similar?
Thanks a lot!!!!In reply to Hash of Arrays for a source command by Debortoli
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |