in reply to Nested greps w/ Perl
G'day wackattack,
"I'm using arrays because these are big files (gigabytes) and I need to do thousands of searches without having to load the file from hard drive each time."
I would neither load the file into an array nor read it, in its entirety, from disk (any number of times). Instead, reading a file of this size line by line, would probably be a better option. Here's how I might tackle this task.
#!/usr/bin/env perl use strict; use warnings; use autodie; my $file_to_search = 'file_to_search'; my $file_of_search_terms = 'file_of_search_terms'; my $file_of_search_counts = 'file_of_search_counts'; my %count; { open my $search_terms_fh, '<', $file_of_search_terms; %count = map { chomp; $_ => 0 } <$search_terms_fh>; } my @search_terms = keys %count; { open my $in_fh, '<', $file_to_search; while (<$in_fh>) { chomp; next if -1 == index $_, 'Z'; for my $search_term (@search_terms) { next if -1 == index $_, $search_term; ++$count{$search_term}; last; } } } { open my $out_fh, '>', $file_of_search_counts; print $out_fh "$_ : $count{$_}\n" for sort @search_terms; }
I used this dummy data for testing:
$ cat file_to_search 100008020Z Z100008020 100008020 100008030Z Z100008030 100008030 100008040Z Z100008040 100008040
$ cat file_of_search_terms 100008010 100008020 100008030 100008040 100008050
Here's the output:
$ cat file_of_search_counts 100008010 : 0 100008020 : 2 100008030 : 2 100008040 : 2 100008050 : 0
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Nested greps w/ Perl
by wackattack (Sexton) on Dec 20, 2016 at 19:01 UTC | |
by kcott (Archbishop) on Dec 20, 2016 at 22:23 UTC | |
by hippo (Archbishop) on Dec 21, 2016 at 09:28 UTC |