a217 has asked for the wisdom of the Perl Monks concerning the following question:
Hello Perlmonks,
I have a file (hashKey.txt) that I would like to use as a list of hash keys in order to separate data from input files (e.g. testReg.txt).
hashKey.txt
chr10 chr10_random chr11 chr11_gl000202_random chr11_random chr12 chr13 chr13_random chr14 chr15 chr15_random chr16 chr16_random chr17_ctg5_hap1 chr17 chr17_gl000203_random chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random chr17_random chr18 chr18_gl000207_random chr18_random chr19 chr19_gl000208_random chr19_gl000209_random chr19_random chr1 chr1_gl000191_random chr1_gl000192_random chr1_random chr20 chr21 chr21_gl000210_random chr21_random chr22 chr22_h2_hap1 chr22_random chr2 chr2_random chr3 chr3_random chr4_ctg9_hap1 chr4 chr4_gl000193_random chr4_gl000194_random chr4_random chr5 chr5_h2_hap1 chr5_random chr6_apd_hap1 chr6_cox_hap1 chr6_cox_hap2 chr6_dbb_hap3 chr6 chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap2 chr6_qbl_hap6 chr6_random chr6_ssto_hap7 chr7 chr7_gl000195_random chr7_random chr8 chr8_gl000196_random chr8_gl000197_random chr8_random chr9 chr9_gl000198_random chr9_gl000199_random chr9_gl000200_random chr9_gl000201_random chr9_random chrM chrUn_gl000211 chrUn_gl000212 chrUn_gl000213 chrUn_gl000214 chrUn_gl000215 chrUn_gl000216 chrUn_gl000217 chrUn_gl000218 chrUn_gl000219 chrUn_gl000220 chrUn_gl000221 chrUn_gl000222 chrUn_gl000223 chrUn_gl000224 chrUn_gl000225 chrUn_gl000226 chrUn_gl000227 chrUn_gl000228 chrUn_gl000229 chrUn_gl000230 chrUn_gl000231 chrUn_gl000232 chrUn_gl000233 chrUn_gl000234 chrUn_gl000235 chrUn_gl000236 chrUn_gl000237 chrUn_gl000238 chrUn_gl000239 chrUn_gl000240 chrUn_gl000241 chrUn_gl000242 chrUn_gl000243 chrUn_gl000244 chrUn_gl000245 chrUn_gl000246 chrUn_gl000247 chrUn_gl000248 chrUn_gl000249 chrX chrX_random chrY
hashKey.txt give a list of all the possible chromosome values there could be in a given input file
testReg.txt
chr1 100 159 0 chr1 200 260 0 chr1 500 750 0 chr3 450 700 0 chr4 100 300 0 chr7 350 600 0 chr9 100 125 0 chr11 679 687 0 chr22 100 200 0 chr22 300 400 0
testReg.txt is simply a test file I use to test the code. It includes various chromosome values along with 3 other columns of data.
My code so far:
#!/usr/bin/perl use warnings; use strict; my (%Chr, %R); my (@key_split, @reg_split); my ($reg_line); open(KEY, "<hashKey.txt") or die "error reading key list"; open(REG, "<testReg.txt") or die "error reading file"; while (<KEY>) { chomp; @key_split = split("\n"); $Chr{"$key_split[0]"} = $key_split[0]; } while (<REG>) { chomp; @reg_split = split("\t"); #$R{"$reg_split[0]"} = ($reg_split[0], $reg_split[1], $reg_split[2 +], $reg_split[3]); $R{"$reg_split[0]"} = $reg_split[0]; } foreach my $key (keys %Chr) { if(exists($R{$key})){ print ("$R{$key}\n"); } } close(KEY); close(REG);
So far, my code prints out all of the chr values in common between hashKey.txt and testReg.txt. What I would like it to do is to print each line to a separate file designated by each chromosome. For example:
chr1.out
chr1 100 159 0 chr1 200 260 0 chr1 500 750 0
chr3.out
chr3 450 700 0
chr4.out
chr4 100 300 0
chr7.out
chr7 350 600 0
chr9.out
chr9 100 125 0
chr11.out
chr11 679 687 0
chr22.out
chr22 100 200 0 chr22 300 400 0
From there I can use each separated file to sort what I need to. I suppose my main problem is trying to figure out how to have the hash variable point toward the unique line. Is what I am trying to accomplish even possible with hash table given that the key could be used for multiple lines? My main goal is to just separate each chr from the input file (testReg.txt) into separate files. If you have any suggestions please let me know.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Using hash keys to separate data
by wfsp (Abbot) on Jun 29, 2011 at 06:04 UTC | |
|
Re: Using hash keys to separate data
by bart (Canon) on Jun 29, 2011 at 07:09 UTC | |
|
Re: Using hash keys to separate data
by Marshall (Canon) on Jun 29, 2011 at 09:07 UTC | |
by a217 (Novice) on Jun 29, 2011 at 15:51 UTC | |
by Marshall (Canon) on Jun 29, 2011 at 22:43 UTC |