Deleting content: bad practice. Please don't do it. If you've concluded that the original question was foolish; easily answered by rechecking code or docs, or other self-education, please edit the node by adding a clearly marked "update," saying so.
"
< title="Loading words from one file, searching another for the frequencies of these words and outputting the wordcounts to another file" created="2014-01-15 06:23:04" updated="2014-01-15 06:23:04">
Hi,
I'm attempting to a take a .txt file of the format:
>Title1
Word1
>Title2
Word2
>Title3
Word3
And use the titles (defined as the word on any line beginning with > while any other words are ignored i.e. Word1) as a wordlist to search another .txt file by and ultimately print the frequency at which each title occurs in this second .txt file. To clarify with an example:
If the titles are:
>Apple
>Banana
>Grape
And the file i'm searching within is:
Apple Banana Avocado Orange
Grape Apple
Apple Banana Banana
The output desired would be:
Apple occurs: 3
Banana occurs: 3
Grape occurs: 1
I've seen various snippets of code around which take wordlists and then search files for them, printing the frequency such as:
sub by_count {
$count{$b} <=> $count{$a};
}
open(INPUT, "<Input.txt");
open(OUTPUT, ">WordFreqs.txt");
$bucket='red|blue|green';
while(<INPUT>){
@words = split(/\s+/);
foreach $word (@words){
if($word=~/($bucket)/io){
$count{$1}++;}
}
}
foreach $word (sort by_count keys %count) {
print OUTPUT "$word occurs $count{$word} times\n";
}
close INPUT;
close OUTPUT;
But I am unsure as to how I can populate the wordlist with the titles or contents of another file rather than specifying them within the code.
If anybody could provide some useful suggestions or snippets of code which I could work on modifying, that would be great - thank you!
- TJC