baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
i need help, or a suggestion on this topic. let say a have a file (ascii) that contains some data, and what i need to do is to split this file into several smaller ones so that integrity of data in the file remains. example:
file to be divided :
the point is not to have part of the 'lines a' in one file and a part in the other one. like:SS line a line a line a line a SS line b line b line b line b line b line b SS line c line c line c ...
number of parts is dynamic , it changes from time to time. so som obvious algorithm's would be :file 1 SS line a line a file 2 line a line a
this is fast clean but creates an unequal distribution of data between files for small number of data objects in a file.open (FILE); my $data = 0; my $partnumber = 5; # changes constantly while(<FILE>){$data++ if ($_ =~ /^SS/)} # to get the number of objects + in a file close FILE; my $chunk =($data + $partnumber)/$partnumber ; # to ensure ther are no + remainders my $index = 1; open(FILE$index); open (FILE); my $i = 0; while(<FILE>){ if (m/^SS/){ $i++; if ($i >= $chunk){ $i = 0; close FILE$index; $index++; open (FILE$index); } } print FILE$index "$_"; } close FILE & FILE$index;
second one would be to do something like this
it resolves the problem of first one with an extra evaluation line (not big deal but ...) so my question is is there a simpler way to do this or this is the simplest one ? and is there a module that does this things so i can see or even copy the procedure from it if it resolves this problem faster and simpler ? thank you .open (FILE); my $data = 0; my $partnumber = 5; # changes constantly while(<FILE>){$data++ if ($_ =~ /^SS/)} # to get the number of objects + in a file close FILE; my $chunk = $data/$partnumber; my $remainder = $data%$partnumber ; my $index = 1; open(FILE$index); open (FILE); my $i = 0; my $remain = 1; while(<FILE>){ if (m/^SS/){ $i++; $chunk +=1 if ( $remain < $remainder); if ($i >= $chunk){ $i = 0; $remain++; close FILE$index; $index++; open (FILE$index); } } print FILE$index "$_"; } close FILE & FILE$index;
Update:
it was ment to be a pseudo code, just to point out dividing algorithms:
chunk = (total + # of chunks)/ # of chunks and remainder = total % # of chunks chunk = total / # of chunk foreach (chunk #){ if (remainder < # of chunks){ add one to ensure that all data is divided between files } }
so as you can see the problem is how to elegantly divide data between files to ensure there is no data corruption and that all data is divided between files
thnx
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: splitting files
by jwkrahn (Abbot) on Mar 14, 2009 at 22:10 UTC | |
|
Re: splitting files
by codeacrobat (Chaplain) on Mar 14, 2009 at 22:06 UTC | |
|
Re: splitting files
by ELISHEVA (Prior) on Mar 15, 2009 at 06:38 UTC | |
|
Re: splitting files
by graff (Chancellor) on Mar 15, 2009 at 22:40 UTC |