in reply to Re: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?
Update: Shorten code
Hello again,
Slurping requires double regular expressions. One for breaking into actual lines and the other for the query. Below, workers receive an array reference containing some number of lines and run slightly faster, possibly due to one regex.
use strict; use warnings; use MCE::Flow; use MCE::Shared; open my $fh, "10-million-combos.zip |" or die "$!"; my $counter1 = MCE::Shared->scalar( 0 ); my $counter2 = MCE::Shared->scalar( 0 ); mce_flow { chunk_size => '1m', max_workers => 8, }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; my $numlines = @{ $chunk_ref }; my $occurances = 0; for ( @{ $chunk_ref } ) { $occurances++ if /123456\r/; } $counter1->incrby( $numlines ); $counter2->incrby( $occurances ); }, $fh; close $fh; print "Num lines : ", $counter1->get(), "\n"; print "Occurances: ", $counter2->get(), "\n";
And finally, the construction for reading the plain text file directly.
use strict; use warnings; use MCE::Flow; use MCE::Shared; my $counter1 = MCE::Shared->scalar( 0 ); my $counter2 = MCE::Shared->scalar( 0 ); mce_flow_f { chunk_size => '1m', max_workers => 8, }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; my $numlines = @{ $chunk_ref }; my $occurances = 0; for ( @{ $chunk_ref } ) { $occurances++ if /123456\r/; } $counter1->incrby( $numlines ); $counter2->incrby( $occurances ); }, "10-million-combos.txt"; print "Num lines : ", $counter1->get(), "\n"; print "Occurances: ", $counter2->get(), "\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: How to optimize a regex on a large file read line by line ?
by John FENDER (Acolyte) on Apr 17, 2016 at 22:17 UTC | |
by Anonymous Monk on Apr 18, 2016 at 16:37 UTC | |
by John FENDER (Acolyte) on Apr 19, 2016 at 23:13 UTC | |
by Anonymous Monk on Apr 20, 2016 at 20:48 UTC | |
by LanX (Saint) on Apr 17, 2016 at 22:48 UTC | |
by John FENDER (Acolyte) on Apr 18, 2016 at 06:31 UTC |