For larger data files and not wanting to deal with chunking manually, then there is the parallel MCE module. This is what one might construct using the MCE::Flow module. We're running 4 workers. Therefore chunking at 24 MB is plenty. Perl and CPAN are amazing allowing this.
use strict; use warnings; use autodie; use MCE::Flow; open Newfile, ">", "./Newfile.txt" or die "Cannot create Newfile.txt"; Newfile->autoflush(1); # important, enable autoflush my ($f1, $f2, @seq) = ('seq.txt', 'mytext.txt'); open(my $fh, $f1); foreach (<$fh>) { chomp; s/^\s+|\s+$//g; push @seq, $_; } close $fh; @seq = sort bylen @seq; # need to sort @seq by length. MCE::Flow::init { max_workers => 4, chunk_size => '24m', init_relay => 1, use_slurpio => 1, }; # For best performance, provide MCE the path, e.g. $f2 # versus a file handle. Workers communicate among themselves # the next offset without involving the manager process. mce_flow_f sub { my ($mce, $slurp_ref, $chunk_id) = @_; foreach my $r (@seq) { my $t = $r; $t =~ s/\h+/bbb/g; $$slurp_ref =~ s/$r/$t/g; } # Relay capability is useful for running something orderly. # For this use case, we've enabled autoflush on the file above. # Only one worker is allowed to run when entering the block. MCE::relay sub { print Newfile $$slurp_ref }; }, $f2; MCE::Flow::finish(); close Newfile; exit 0; sub bylen { length($b) <=> length($a); }
In reply to Re^2: script optmization
by Anonymous Monk
in thread script optmization
by shoura
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |