comment on

Hi, I'm attempting to a take a .txt file of the format:

>Title1
Word1
>Title2
Word2
>Title3
Word3
[download]

And use the titles (defined as the word on any line beginning with > while any other words are ignored i.e. Word1) as a wordlist to search another .txt file by and ultimately print the frequency at which each title occurs in this second .txt file. To clarify with an example: If the titles are:

>Apple
>Banana
>Grape
[download]

And the file i'm searching within is:

Apple Banana Avocado Orange
Grape Apple
Apple Banana Banana
[download]

The output desired would be:

Apple occurs: 3
Banana occurs: 3
Grape occurs: 1
[download]

I've seen various snippets of code around which take wordlists and then search files for them, printing the frequency such as:

sub by_count {
   $count{$b} <=> $count{$a};
}

open(INPUT, "<Input.txt");
open(OUTPUT, ">WordFreqs.txt");
$bucket='red|blue|green';

while(<INPUT>){
   @words = split(/\s+/);
   foreach $word (@words){
            if($word=~/($bucket)/io){
      $count{$1}++;}
   }
}
foreach $word (sort by_count keys %count) {
   print OUTPUT "$word occurs $count{$word} times\n";
}

close INPUT;
close OUTPUT;
[download]

But I am unsure as to how I can populate the wordlist with the titles or contents of another file rather than specifying them within the code. If anybody could provide some useful suggestions or snippets of code which I could work on modifying, that would be great - thank you! - TJC

In reply to Loading words from one file, searching another for the frequencies of these words and outputting the wordcounts to another file by TJCooper

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.