comment on

My main goal is to just separate each chr from the input file (testReg.txt) into separate files. If you have any suggestions please let me know.

You are over-thinking this. You don't even need the file: hashKey.txt. The file testReg.txt is I think the 15GB monster file. If this file is not already sorted, use the system command line sort to do that. The command line sort can sort things way bigger than the size of memory.

Now all of the lines that have the same chromosome will be grouped together in the file. We just read the file and every time we switch to a new chromosome, we start a new file.

#!/usr/bin/perl -w
use strict;

my $curr_chrom = "";

while (<DATA>)
{
   my ($chrom) = split;  # $chrom is the first column
                         # parens on the left side are needed
                         # for list context
   if ($chrom ne $curr_chrom)
   {
       $curr_chrom = $chrom;
       open (OUT, '>', "$curr_chrom.out") 
          or die "unable to write $curr_chrom.out $!\n";
   }
   print OUT;
}
close OUT;


__DATA__
chr1    100 159 0
chr1    200 260 0
chr1    500 750 0
chr3    450 700 0
chr4    100 300 0
chr7    350 600 0
chr9    100 125 0
chr11   679 687 0
chr22   100 200 0
chr22   300 400 0
[download]

A few notes: If a file handle is open to one file and it is used again and opened to another file, the first file is closed automatically (no need to close it explicitly). For your data, normally you want to split on any series of white space characters split(/\s+/,$_) is the "default" split and is what is used by: $chrom = split;. Trying to split on \t is probably and certainly \n is not what you want.

Update: From the wording of the post, I don't think that you are interested in a subset of the chromosomes in the input file, but if you were, then here's how. Make a hash table with keys being the chromosomes that you want. In the above program, when the chromosome changes (the if statement), test if the chromosome is on the "approved" list (name exists in the hash table) or not. If it does exist, then open OUT to that name like above, if it does not, then open OUT to "/dev/null". /dev/null is a special device that discards all stuff written to it (it is the "bit bucket"). That way you always execute the print OUT; statement. Sometimes it goes somewhere useful and sometimes into the black hole of bits.

To make the hash, your code:

while (<KEY>) {

    chomp;
    @key_split = split("\n");
    $Chr{"$key_split[0]"} = $key_split[0];
}
## better written as: ##
while (<KEY>) {
    my ($chrom) = split;
    $Chr{$chrom}=1;
}
[download]

In reply to Re: Using hash keys to separate data by Marshall
in thread Using hash keys to separate data by a217

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.