Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Help me beat NodeJS

by marioroy (Prior)
on Feb 13, 2016 at 18:23 UTC ( [id://1155164]=note: print w/replies, xml ) Need Help??


in reply to Re: Help me beat NodeJS
in thread Help me beat NodeJS

Update: added pattern matching to the demonstration

If parallel reads is desired, the following demonstration does the same thing. Basically, specifying chunk_size => 1 is all one needs to do for getting MCE to run like Parallel::ForkManager.

#!/usr/local/bin/perl use strict; use warnings; use MCE::Loop; my $dir = 'logs/*.log.gz'; my @files = sort(glob "$dir"); my $pattern = "some_string"; MCE::Loop::init { max_workers => 24, chunk_size => 1, }; mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; open( my $fh, "-|", "zcat", $chunk_ref->[0] ) or die "open error: +$!\n"; while ( my $line = <$fh> ) { if ( $line =~ /$pattern/ ) { my @matches = $line =~ /".*?"|\S+/g; print "$matches[0],$matches[1],$matches[3],$matches[4]\n"; } } close $fh; } @files;

Regards, Mario

Replies are listed 'Best First'.
Re^3: Help me beat NodeJS
by rickyw59 (Novice) on Feb 13, 2016 at 20:59 UTC

    Wow thanks, I'll give this a shot. I will have to read some more on MCE, it looks very useful. I should have clarified, in the "parser" function in nodejs, I'm applying the same regex as perl to be fair. I've done the tests looking for a specific string (/some_string/) and I've done the regex in the above code (/".*?"|\S+/g), which captures everything in an array, since the lines are in this format: ' 1970-01-01 00:00:00 1.1.1.1 "A multi-word field" 2.2.2.2 '

      Got it. I went ahead and updated both MCE demonstrations to account for pattern matching. The more expensive regex (/".*?"|\S+/g) pattern is processed only if given line matches the initial string pattern. That will likely run faster.

      Likewise, for Parallel::ForkManager.

      #!/usr/local/bin/perl use strict; use warnings; use Parallel::ForkManager; my $pm = new Parallel::ForkManager(24); my $dir = '/data/logs/*.log.gz'; my @files = sort(glob "$dir"); my $pattern = "some_string"; $pm->set_waitpid_blocking_sleep(0); for my $file( @files ) { $pm->start and next; open( my $fh, "-|", "/bin/zcat", $file ) or die "open error: $!\n" +; while ( my $line = <$fh> ) { if ( $line =~ /$pattern/ ) { my @matches = $line =~ /".*?"|\S+/g; print "$matches[0],$matches[1],$matches[3],$matches[4]\n"; } } $pm->finish; } $pm->wait_all_children;

      Regards, Mario

        If you just looking for a plain string not a regex, then it should be quicker to use index

        while (my $line = <$fh> ) { if (index($line,$pattern) != -1 ) { ... } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1155164]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-03-28 17:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found