moviesigh has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, I tried to count the frequency of specific words. For example, I determine a main word as "Flannery" and I would like to count the frequency of "banking" and "institutions" within 10 words from "Flannery" in the document. The sample document is below. ex) Writing various empirical banking papers concerning about financial institutions, Flannery needs to contact representatives of the institutions Here is a question. For a word "institutions" that is located prior to Flannery, the result does not count it if I put "institution" instead of "institutions" in my command. On the other hand, a word "institutions" that is located on after Flannery is counted even if I use "institution" as a command. Could you please give me some advice to solve this problem? I hope you have a great day. Thank you very much! Sean

Replies are listed 'Best First'.
Re: Wildcard question
by AnomalousMonk (Archbishop) on Jan 16, 2014 at 08:08 UTC

      "You shall know a word by the company it keeps."

      This is a common task in linguistic analysis. There is no mention of regular expressions anywhere in the question.
      (Indeed, I would approach this problem by tokenizing the text into array of words.)

Re: Wildcard question
by LanX (Saint) on Jan 16, 2014 at 11:29 UTC
    Your post is very hard to read and understand!

    Please use <c> and <p> tags to better distinguish the example and questions.

    And please explain: Whats the difference to your first question? (which was also badly formatted)

    Counting the keywords in the text file

    Cheers Rolf

    ( addicted to the Perl Programming Language)