G'day wackattack,

"I'm using arrays because these are big files (gigabytes) and I need to do thousands of searches without having to load the file from hard drive each time."

I would neither load the file into an array nor read it, in its entirety, from disk (any number of times). Instead, reading a file of this size line by line, would probably be a better option. Here's how I might tackle this task.

#!/usr/bin/env perl use strict; use warnings; use autodie; my $file_to_search = 'file_to_search'; my $file_of_search_terms = 'file_of_search_terms'; my $file_of_search_counts = 'file_of_search_counts'; my %count; { open my $search_terms_fh, '<', $file_of_search_terms; %count = map { chomp; $_ => 0 } <$search_terms_fh>; } my @search_terms = keys %count; { open my $in_fh, '<', $file_to_search; while (<$in_fh>) { chomp; next if -1 == index $_, 'Z'; for my $search_term (@search_terms) { next if -1 == index $_, $search_term; ++$count{$search_term}; last; } } } { open my $out_fh, '>', $file_of_search_counts; print $out_fh "$_ : $count{$_}\n" for sort @search_terms; }

I used this dummy data for testing:

$ cat file_to_search 100008020Z Z100008020 100008020 100008030Z Z100008030 100008030 100008040Z Z100008040 100008040
$ cat file_of_search_terms 100008010 100008020 100008030 100008040 100008050

Here's the output:

$ cat file_of_search_counts 100008010 : 0 100008020 : 2 100008030 : 2 100008040 : 2 100008050 : 0

— Ken


In reply to Re: Nested greps w/ Perl by kcott
in thread Nested greps w/ Perl by wackattack

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.