Re: splitting files by number of words

This will get you started. The difference between this and your question is that I'm outputting the result in a hash of array. But you can simply change that to output to a file instead.


#!/usr/bin/perl

use strict;
use Data::Dumper;

my (@total_words, %record_file_of);
my $counter = 0;
my $num_of_files = 3;

@total_words = split while (<DATA>);

my $words_per_file = int( scalar @total_words
  / $num_of_files );

for my $i (0 .. $#total_words) {
    $counter++ if ( $i % $words_per_file == 0 );
    push( @{ $record_file_of{$counter} }
      , $total_words[$i] );
}

print Dumper \%record_file_of;

__DATA__
This is a test of words. This should be divided into equal files.
[download]

Comment on Re: splitting files by number of words Download Code

Replies are listed 'Best First'.
Re^2: splitting files by number of words by Fletch (Bishop) on Aug 06, 2009 at 13:09 UTC
Erm . . . `@total_words = split while (<DATA>);` [download] only works because you've got a single line of test data. With more than one line you'd wind up only getting the number of words in the last line processed. The correct way to do what you're attempting would be along the lines of `push @total_words, split`; however you'd then wind up keeping all of the words in memory which, given the original constraint of "very large files", is probably not going to be viable. The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re^3: splitting files by number of words by si_lence (Deacon) on Aug 06, 2009 at 13:26 UTC
or you could just use `$total_words += split while(<DATA>);` [download] cheers, si_lence	[reply] [d/l]
Re^3: splitting files by number of words by bichonfrise74 (Vicar) on Aug 06, 2009 at 16:58 UTC
Thanks, I didn't notice that.	[reply]