Let me try to explain this a little bit better. My file contains 21 million URLs that our search engine has indexed on a Intranet. I have been tasked to check each URL against a "blacklist" file to see if the URL matches or not. I need to output a report that shows each host, #of URLs, # Not Blacklisted, % Not Blacklisted, # Blacklisted, & % Blacklisted.
I have an array of blacklisted regular expressions and couple of hashes for not blacklisted and blacklisted.
open(MYINFILE< $URLLIST) || die;
while ( my $url = <MYINFILE> ) {
chomp($url);
if ($url ne "") {
my ($host)=GetHost($url);
my ($blacklisted) = isBlackListed($url);
if ($blacklisted) {
$BLACKLISTED{$host}++;
} else {
$NOTBLACKLISTED{$host}++;
}
}
printReport
I was hoping that using threads and sharing the Arrays and Hashes would speed up the processing of this file. I may want to break up the file into several files, maybe 1 host per file and process each file independently. At the end, build a complete report from each output.
In reply to Re^2: Processing large file using threads
by mjacobson
in thread Processing large file using threads
by mjacobson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |