If it were me, I'd probably use something like Tie::File to get the number of lines:
Counting lines in large files (which presumably these are, hence the need to split them) is a really terrible way to use Tie::File. To quote the author:
There is a large memory overhead for each record offset and for each cache entry: about 310 bytes per cached data record, and about 21 bytes per offset table entry.
The per-record overhead will limit the maximum number of records you can access per file. Note that accessing the length of the array via $x = scalar @tied_file accesses all records and stores their offsets. The same for foreach (@tied_file), even if you exit the loop early.
A simple:
sub countLines { my $filename = shift; open my $fh, '<', $filename or croak("failed to open '$filename' - $!"); my $count = 0; $count ++ while <$fh> return $count: }
Is far, far (and for very large files; far) more efficient that abusing Tie::File for this. And it is hardly more complex. For very large files, using a larger buffer will save a little more time:
sub countLines { my $filename = shift; open my $fh, '<', $filename or croak("failed to open '$filename' - $!"); my $count = 0; local $/ = \2**26; ## 64MB raise or lower to taste $count += tr[\n][\n] while <$fh>; return $count: }
And File::Split will blow memory if the input file (or combined output file in the case of merge_files()) is larger than memory.
Blind CPANitis serves no one. Teaching it ...
In reply to Re^2: split file in N part
by BrowserUk
in thread split file in N part
by jeepj
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |