Update: Shorten code
Hello again,
Slurping requires double regular expressions. One for breaking into actual lines and the other for the query. Below, workers receive an array reference containing some number of lines and run slightly faster, possibly due to one regex.
use strict; use warnings; use MCE::Flow; use MCE::Shared; open my $fh, "10-million-combos.zip |" or die "$!"; my $counter1 = MCE::Shared->scalar( 0 ); my $counter2 = MCE::Shared->scalar( 0 ); mce_flow { chunk_size => '1m', max_workers => 8, }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; my $numlines = @{ $chunk_ref }; my $occurances = 0; for ( @{ $chunk_ref } ) { $occurances++ if /123456\r/; } $counter1->incrby( $numlines ); $counter2->incrby( $occurances ); }, $fh; close $fh; print "Num lines : ", $counter1->get(), "\n"; print "Occurances: ", $counter2->get(), "\n";
And finally, the construction for reading the plain text file directly.
use strict; use warnings; use MCE::Flow; use MCE::Shared; my $counter1 = MCE::Shared->scalar( 0 ); my $counter2 = MCE::Shared->scalar( 0 ); mce_flow_f { chunk_size => '1m', max_workers => 8, }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; my $numlines = @{ $chunk_ref }; my $occurances = 0; for ( @{ $chunk_ref } ) { $occurances++ if /123456\r/; } $counter1->incrby( $numlines ); $counter2->incrby( $occurances ); }, "10-million-combos.txt"; print "Num lines : ", $counter1->get(), "\n"; print "Occurances: ", $counter2->get(), "\n";
In reply to Re^2: How to optimize a regex on a large file read line by line ?
by marioroy
in thread How to optimize a regex on a large file read line by line ?
by John FENDER
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |