I have the same problem as seen here. I need to split files according to size, but the splits can only happen on "\n" new lines. So the size can be a little more or less, but it can only be split on a newline.

I'd like to create a module to do this... maybe File::Split::More or File::Split::Qualifier or File::Split::ApproxSize?

What I was thinking was to have an object where it is created with the output directory, the input file, and some options. The options would be the minimum size or maximum file size and the values to split on. The value to split on would be the acceptable location split a file on at the location closest to the file size.

Any comments would be apprecieated....

Here's an example:
use File::Split::More; my $output_dir = '/tmp'; my $split_file = '/home/me/input.txt'; my $split_rx = File::Split::More->new( out_dir => $output_dir, in_file => $split_file, min_size => 2147483648, # new file size in bytes split_on_regex => qr{\n}, auto_die => 0, ); my $split_string = File::Split::More->new( out_dir => $output_dir, in_file => $split_file, min_size => 2147483648, # new file size in bytes split_on_string => "\n", ); # do the split $split_rx->split or die "Can't split file: $!\n"; # get list of new split file names my @split_files = $split_rx->split_files # get list of file handles my @split_file_handles = $split_rx->split_file_handles; # get list of tie::files my @split_tie_files = $split_rx->split_tie_files;

Replies are listed 'Best First'.
Re: RFC: New Split File Module
by toolic (Bishop) on Jun 22, 2012 at 01:23 UTC
    Are you aware of the unix split utility? By default, it splits on \n, and its -C option will "put at most SIZE bytes of lines per output file".

    There is also the split Perl utility, which has a -l option.

    If neither of those wheels suits your purposes, you could also consider requesting to become a maintainer of the existing File::Split CPAN module. If you think there is enough overlap between your idea and that module's purpose, and the author is willing, it would solve your naming conundrum. File::Split really looks like it needs to be resuscitated. It has no passing tests on any platform/version according to the CPAN Testers reports. In fact, it only has one test, which checks next to nothing. A bug report was filed 6 years ago and has not been acknowledged. There is a chance that it has been abandoned, or maybe the author would at least appreciate it if someone revived it.

    $split_rx->split or die "Can't split file: $!\n";

    You probably want to create your own error variable or method instead of using $! directly:

      Or better yet thrown an exception  croak "Can't split file ($split_rx blah blah ) : $!" so the user doesn't have to add "or die"