in reply to splitting files by number of words

This will get you started. The difference between this and your question is that I'm outputting the result in a hash of array. But you can simply change that to output to a file instead.
#!/usr/bin/perl use strict; use Data::Dumper; my (@total_words, %record_file_of); my $counter = 0; my $num_of_files = 3; @total_words = split while (<DATA>); my $words_per_file = int( scalar @total_words / $num_of_files ); for my $i (0 .. $#total_words) { $counter++ if ( $i % $words_per_file == 0 ); push( @{ $record_file_of{$counter} } , $total_words[$i] ); } print Dumper \%record_file_of; __DATA__ This is a test of words. This should be divided into equal files.

Replies are listed 'Best First'.
Re^2: splitting files by number of words
by Fletch (Bishop) on Aug 06, 2009 at 13:09 UTC

    Erm . . .

    @total_words = split while (<DATA>);

    only works because you've got a single line of test data. With more than one line you'd wind up only getting the number of words in the last line processed. The correct way to do what you're attempting would be along the lines of push @total_words, split; however you'd then wind up keeping all of the words in memory which, given the original constraint of "very large files", is probably not going to be viable.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      or you could just use
      $total_words += split while(<DATA>);
      cheers, si_lence
      Thanks, I didn't notice that.