in reply to Multi-thread combining the results together
It may be obvious and you have already considered this (then I'm sorry, and skip what follows), but you are starting regex engine 6+ billion times. If the %result is relatively sparsely populated in the end, and if tokens can be joined using clearly "alien" separator symbol (or sequence) to prevent matching across tokens, then matching against concatenated string (regex engine starts just N times) can help. In code below, if line A is un-commented, then block B is executed N*N = 1e6 times as expected, and each token "matches" all other tokens -- very uninteresting. Otherwise, with more picky criteria of a token to be related to another token, your goal of "at least 3x faster" is easily achieved even before parallelization.
use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; my $N = 1000; srand 123; my @tokens = map { int rand 1_000_000 } 1 .. $N; sub build_regex { # return qr/\d+/; # line A my $s = shift; my $d = substr $s, 0, 1; qr/[0-9]$d\d{0,9}?$d/ } { # case 1 my $t = time; my %result; foreach my $token (@tokens) { my $regex = build_regex($token); my @line_results = grep {$_ ne $token and /$regex/ }@tokens; $result{$token} = [@line_results]; } say time - $t; } { # case 2 my $t = time; my $count = 0; my $sep = '~'; my $sep_len = length $sep; my @idx; for ( 0 .. $#tokens ) { my $L = length $tokens[ $_ ]; @idx[ map{ $sep_len + @idx + $_ } 0 .. $L - 1 ] = ( $_ ) x $L } my $concat = join $sep, '', @tokens, ''; my %result; for my $i ( 0 .. $#tokens ) { my $token = $tokens[ $i ]; my $regex = build_regex( $token ); $result{ $token } = []; my $prev = -1; while ( $concat =~ /$regex/g ) { # block B my $j = $idx[ $-[ 0 ]]; push @{ $result{ $token }}, $tokens[ $j ] if $j != $i and $j != $prev; $prev = $j; $count ++; } } say time - $t; say $count; } __END__ # Output with "A" line un-commented 0.978141069412231 1.23276996612549 1000000 # Output with "A" line commented-out 0.648768901824951 0.150562047958374 78176
Edit: (1) replaced separator character with more neutral "~" from "|", so it doesn't look like regex alternation; (2) added "comments" to output section, so it's more clear they are different runs.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Multi-thread combining the results together
by Marshall (Canon) on Jul 25, 2019 at 10:52 UTC | |
by 1nickt (Canon) on Jul 25, 2019 at 11:12 UTC | |
by vr (Curate) on Jul 25, 2019 at 12:29 UTC | |
by AnomalousMonk (Archbishop) on Jul 25, 2019 at 14:50 UTC | |
by Marshall (Canon) on Jul 27, 2019 at 02:53 UTC | |
by vr (Curate) on Jul 27, 2019 at 17:30 UTC |