Re: How split big file depending of number of iterations

Create a hash with 19200 entries. The key will be the values you want to have stored in that file and the value will be the filehandle. Then, iterate through your file, parsing each line. Find the filehandle associated with that value and print the line to that filehandle. Something like:

use IO::File;

my %filehandle;
while (my ($k, $v) = associate_filename_with_values())
{
    $filehandle{$v} = IO::File->new(">$k")
        || die "Cannot open '$k' for writing\n";
}

my $data_fh = IO::File->new($datafile)
    || die "Cannot open '$datafile' for reading\n";

while (<$data_fh>)
{
    chomp;
    next unless length $_;
    my @line = split, /\s+/;

    my $value = $line[SOME_INDEX];
    $filehandle{$value}->print("$_\n");
}

$data_fh->close;
$_->close for values %filehandle;
[download]

That code will need some work, especially in associating filename with value, but that should give you a headstart.

Of course, if you want to pay me to write it for you, email me at rkinyon@columbus.rr.com - I charge decent rates for bioinformatics work.

------
We are the carpenters and bricklayers of the Information Age.

The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Comment on Re: How split big file depending of number of iterations Download Code

Replies are listed 'Best First'.
Re: Re: How split big file depending of number of iterations by sgifford (Prior) on Aug 11, 2003 at 16:36 UTC
I'm not aware of any UNIX that will allow you to have 19200 files open simultaneously; it's possible the situation is different under Windows, but I doubt it. In a quick test, I can open 1020 files before my script dies with a `Too many open files` error. Workarounds include storing the filename in the hash, opening the file before using it and closing it afterwards; do the same thing, but cache open filehandles, and upon receiving a `Too many open files` error close the least-recently-used filehandle, and jot down somewhere that it needs to be re-opened; append the data to the hash entry, itself, then go through all hash entries and print their contents to the appropriate file.	[reply]
Re: Re: How split big file depending of number of iterations by ChrisS (Monk) on Aug 11, 2003 at 14:48 UTC
The code above looks like it might accomplish what you asked, but I have a sneaking suspicion that you really want to do something else with that data, once you've created the large number of smaller files. If so, you'll probably find the current design to be quite expensive, due to a lot of IO. If I'm right, would you consider describing the purpose of this routine? We might be able to come up with a better process -- or not. ;-) If I'm wrong, and you really just want to create a bunch of small files, I apologize for my errant hunch.	[reply]
Re: Re: How split big file depending of number of iterations by Anonymous Monk on Aug 11, 2003 at 14:49 UTC
Thank you dragonchild for you fast reply, as you said it is a good headstart, I will work on it. I am a student doing an internship in this lab, I cannot afford to pay any amount even is decent ;). I asked for help because I am running out of time (deadline) and at this moment my mind is block!.. Thanks again	[reply]