Thank you for providing sample input, but unfortunately, when I run the code on this sample input, the output is empty. Could you provide sample input that produces some output, and to play it safe also provide that output, each inside <code> tags? See also Short, Self-Contained, Correct Example.
In general, as has already been suggested, a hash table provides for much faster lookups than a linear scan with nested loops.
use warnings; use strict; use List::Util qw/ shuffle /; use Time::HiRes qw/ gettimeofday tv_interval /; use Test::More tests=>2; my @looking_for = qw/ foo bar quz baz /; my @looking_in = shuffle qw/ foo bar quz baz / x 100_000, qw/ some other stuff we're not looking for / x 2_000_000; { my $t0 = [gettimeofday]; my $found_count; for my $haystack (@looking_in) { for my $needle (@looking_for) { if ( $needle eq $haystack ) { $found_count++; } } } is $found_count, 400_000, 'linear scan'; diag sprintf "that took %.3fs", tv_interval($t0); } { my $t0 = [gettimeofday]; my $found_count; my %needles_hash = map { ($_=>1) } @looking_for; diag 'needles_hash: ', explain \%needles_hash; for my $haystack (@looking_in) { if ( $needles_hash{$haystack} ) { $found_count++; } } is $found_count, 400_000, 'hash lookup'; diag sprintf "that took %.3fs", tv_interval($t0); } __END__ 1..2 ok 1 - linear scan # that took 2.274s # needles_hash: { # 'bar' => 1, # 'baz' => 1, # 'foo' => 1, # 'quz' => 1 # } ok 2 - hash lookup # that took 0.599s
When you're asking the question "do the strings in the haystack contain any of the needles", or in general when what you're looking for is not a fixed string but can be expressed as a regex, an alternative is to build a regex.
use warnings; use strict; use List::Util qw/ shuffle /; use Time::HiRes qw/ gettimeofday tv_interval /; use Test::More tests=>2; my @looking_for = qw/ foo bar quz baz /; my @looking_in = shuffle qw/ xyfooz abcbarx 123quzy abazz / x 10_000, qw/ some other stuff we're not looking for / x 100_000; { my $t0 = [gettimeofday]; my $found_count; for my $haystack (@looking_in) { for my $needle (@looking_for) { if ( $haystack =~ /\Q$needle\E/ ) { $found_count++; } } } is $found_count, 40_000, 'linear scan'; diag sprintf "that took %.3fs", tv_interval($t0); } { my $t0 = [gettimeofday]; my $found_count; my ($needles_regex) = map {qr/$_/} join '|', map {quotemeta} sort { length $b <=> length $a or $a cmp $b } @looking_for; diag "needles_regex: ", explain $needles_regex; for my $haystack (@looking_in) { if ( $haystack =~ $needles_regex ) { $found_count++; } } is $found_count, 40_000, 'regex'; diag sprintf "that took %.3fs", tv_interval($t0); } __END__ 1..2 ok 1 - linear scan # that took 2.716s # needles_regex: qr/bar|baz|foo|quz/ ok 2 - regex # that took 0.212s
In reply to Re: how to avoid full scan in file.
by haukex
in thread how to avoid full scan in file.
by EBK
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |