Thank you for providing sample input, but unfortunately, when I run the code on this sample input, the output is empty. Could you provide sample input that produces some output, and to play it safe also provide that output, each inside <code> tags? See also Short, Self-Contained, Correct Example.

In general, as has already been suggested, a hash table provides for much faster lookups than a linear scan with nested loops.

use warnings; use strict; use List::Util qw/ shuffle /; use Time::HiRes qw/ gettimeofday tv_interval /; use Test::More tests=>2; my @looking_for = qw/ foo bar quz baz /; my @looking_in = shuffle qw/ foo bar quz baz / x 100_000, qw/ some other stuff we're not looking for / x 2_000_000; { my $t0 = [gettimeofday]; my $found_count; for my $haystack (@looking_in) { for my $needle (@looking_for) { if ( $needle eq $haystack ) { $found_count++; } } } is $found_count, 400_000, 'linear scan'; diag sprintf "that took %.3fs", tv_interval($t0); } { my $t0 = [gettimeofday]; my $found_count; my %needles_hash = map { ($_=>1) } @looking_for; diag 'needles_hash: ', explain \%needles_hash; for my $haystack (@looking_in) { if ( $needles_hash{$haystack} ) { $found_count++; } } is $found_count, 400_000, 'hash lookup'; diag sprintf "that took %.3fs", tv_interval($t0); } __END__ 1..2 ok 1 - linear scan # that took 2.274s # needles_hash: { # 'bar' => 1, # 'baz' => 1, # 'foo' => 1, # 'quz' => 1 # } ok 2 - hash lookup # that took 0.599s

When you're asking the question "do the strings in the haystack contain any of the needles", or in general when what you're looking for is not a fixed string but can be expressed as a regex, an alternative is to build a regex.

use warnings; use strict; use List::Util qw/ shuffle /; use Time::HiRes qw/ gettimeofday tv_interval /; use Test::More tests=>2; my @looking_for = qw/ foo bar quz baz /; my @looking_in = shuffle qw/ xyfooz abcbarx 123quzy abazz / x 10_000, qw/ some other stuff we're not looking for / x 100_000; { my $t0 = [gettimeofday]; my $found_count; for my $haystack (@looking_in) { for my $needle (@looking_for) { if ( $haystack =~ /\Q$needle\E/ ) { $found_count++; } } } is $found_count, 40_000, 'linear scan'; diag sprintf "that took %.3fs", tv_interval($t0); } { my $t0 = [gettimeofday]; my $found_count; my ($needles_regex) = map {qr/$_/} join '|', map {quotemeta} sort { length $b <=> length $a or $a cmp $b } @looking_for; diag "needles_regex: ", explain $needles_regex; for my $haystack (@looking_in) { if ( $haystack =~ $needles_regex ) { $found_count++; } } is $found_count, 40_000, 'regex'; diag sprintf "that took %.3fs", tv_interval($t0); } __END__ 1..2 ok 1 - linear scan # that took 2.716s # needles_regex: qr/bar|baz|foo|quz/ ok 2 - regex # that took 0.212s

In reply to Re: how to avoid full scan in file. by haukex
in thread how to avoid full scan in file. by EBK

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.