Loading words from one file, searching another for the frequencies of these words and outputting the wordcounts to another file

TJCooper has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Please delete this question. by ww (Archbishop) on Jan 15, 2014 at 12:26 UTC
Deleting content: bad practice. Please don't do it. If you've concluded that the original question was foolish; easily answered by rechecking code or docs, or other self-education, please edit the node by adding a clearly marked "update," saying so. Original question (para formatting added): " < title="Loading words from one file, searching another for the frequencies of these words and outputting the wordcounts to another file" created="2014-01-15 06:23:04" updated="2014-01-15 06:23:04"> Hi, I'm attempting to a take a .txt file of the format: `>Title1 Word1 >Title2 Word2 >Title3 Word3` [download] And use the titles (defined as the word on any line beginning with > while any other words are ignored i.e. Word1) as a wordlist to search another .txt file by and ultimately print the frequency at which each title occurs in this second .txt file. To clarify with an example: If the titles are: `>Apple >Banana >Grape` [download] And the file i'm searching within is: `Apple Banana Avocado Orange Grape Apple Apple Banana Banana` [download] The output desired would be: `Apple occurs: 3 Banana occurs: 3 Grape occurs: 1` [download] I've seen various snippets of code around which take wordlists and then search files for them, printing the frequency such as: `sub by_count { $count{$b} <=> $count{$a}; } open(INPUT, "<Input.txt"); open(OUTPUT, ">WordFreqs.txt"); $bucket='red\|blue\|green'; while(<INPUT>){ @words = split(/\s+/); foreach $word (@words){ if($word=~/($bucket)/io){ $count{$1}++;} } } foreach $word (sort by_count keys %count) { print OUTPUT "$word occurs $count{$word} times\n"; } close INPUT; close OUTPUT;` [download] But I am unsure as to how I can populate the wordlist with the titles or contents of another file rather than specifying them within the code. If anybody could provide some useful suggestions or snippets of code which I could work on modifying, that would be great - thank you! - TJC Come, let us reason together: Spirit of the Monastery	[reply] [d/l] [select]
Re: Loading words from one file, searching another for the frequencies of these words and outputting the wordcounts to another file by kcott (Archbishop) on Jan 19, 2014 at 13:05 UTC
G'day TJCooper, I got to this thread rather late. I'm unclear as to what happened with deletes and updates in your OP. Given what's there at the moment, this technique should do what you want: `#!/usr/bin/env perl -l use strict; use warnings; use Inline::Files; my %found; /^>(.*)\Z/ and $found{$1} = 0 while <REFERENCE_FILE>; my $re = '\b(' . join('\|' => keys %found) . ')\b'; while (<SEARCH_FILE>) { ++$found{$1} while /$re/g; } print "$_ occurs: $found{$_}" for sort keys %found; __REFERENCE_FILE__ >Apple ignored blah blah blah >Banana ignored blah blah blah >Grape ignored blah blah blah __SEARCH_FILE__ Apple Banana Avocado Orange Grape Apple Apple Banana Banana` [download] Output: `Apple occurs: 3 Banana occurs: 3 Grape occurs: 1` [download] I've used Inline::Files for demonstration purposes. You'll need to open your real files. I recommend you read the documentation and get into the habit of using the 3-argument form with lexical filehandles. Furthermore, if you're not going to write code to check your I/O, you should consider using the autodie pragma. -- Ken	[reply] [d/l] [select]