You are over-thinking this. You don't even need the file: hashKey.txt. The file testReg.txt is I think the 15GB monster file. If this file is not already sorted, use the system command line sort to do that. The command line sort can sort things way bigger than the size of memory.
Now all of the lines that have the same chromosome will be grouped together in the file. We just read the file and every time we switch to a new chromosome, we start a new file.
A few notes: If a file handle is open to one file and it is used again and opened to another file, the first file is closed automatically (no need to close it explicitly). For your data, normally you want to split on any series of white space characters split(/\s+/,$_) is the "default" split and is what is used by: $chrom = split;. Trying to split on \t is probably and certainly \n is not what you want.#!/usr/bin/perl -w use strict; my $curr_chrom = ""; while (<DATA>) { my ($chrom) = split; # $chrom is the first column # parens on the left side are needed # for list context if ($chrom ne $curr_chrom) { $curr_chrom = $chrom; open (OUT, '>', "$curr_chrom.out") or die "unable to write $curr_chrom.out $!\n"; } print OUT; } close OUT; __DATA__ chr1 100 159 0 chr1 200 260 0 chr1 500 750 0 chr3 450 700 0 chr4 100 300 0 chr7 350 600 0 chr9 100 125 0 chr11 679 687 0 chr22 100 200 0 chr22 300 400 0
Update: From the wording of the post, I don't think that you are interested in a subset of the chromosomes in the input file, but if you were, then here's how. Make a hash table with keys being the chromosomes that you want. In the above program, when the chromosome changes (the if statement), test if the chromosome is on the "approved" list (name exists in the hash table) or not. If it does exist, then open OUT to that name like above, if it does not, then open OUT to "/dev/null". /dev/null is a special device that discards all stuff written to it (it is the "bit bucket"). That way you always execute the print OUT; statement. Sometimes it goes somewhere useful and sometimes into the black hole of bits.
To make the hash, your code:
while (<KEY>) { chomp; @key_split = split("\n"); $Chr{"$key_split[0]"} = $key_split[0]; } ## better written as: ## while (<KEY>) { my ($chrom) = split; $Chr{$chrom}=1; }
In reply to Re: Using hash keys to separate data
by Marshall
in thread Using hash keys to separate data
by a217
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |