in reply to Re^5: Multi-thread combining the results together
in thread Multi-thread combining the results together
I tried the idea of using a multi-line, match global upon a string of \n separated tokens instead of running a regex on each token individually. This didn't work. This is significantly slower than the current code.
I can't explain better, but if you really need anchoring within tokens, care should be taken to let re-engine fail ASAP and move ahead. In example below, don't let it aimlessly do "/.+/", when it's clear it won't find "123" before next separator. It's really contrived example (and a no-op), not about threads anymore, maybe it's time for another SOPW question with real dataset and SSCCE (and better explanation).
use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; my $N = 1000; srand 123; my @tokens = map { int rand 1e9 } 1 .. $N; sub build_regex { qr/^.+123/m } sub build_regex2 { qr/^.+\K123/m } { # case 1 my $t = time; my $count = 0; for my $token ( @tokens ) { my $regex = build_regex( $token ); /$regex/ && $count++ for @tokens; } say $count; say time - $t; } { # case 2 my $t = time; my $count = 0; my $concat = join "\n", @tokens; for my $token ( @tokens ) { my $regex = build_regex( $token ); $count++ while $concat =~ /$regex/g } say time - $t; } { # case 3 my $t = time; my $count = 0; my $concat = join "\n", @tokens; for my $token ( @tokens ) { my $regex = build_regex2( $token ); $count++ while $concat =~ /$regex/g } say time - $t; } __END__ 5000 0.384264945983887 1.7059121131897 0.13309907913208
|
|---|