in reply to Most frequent words in Gone with the Wind, help!

The question is, what format do you have the words in already?

Personally, I would simply go through the text and manually delete all non-fitting paragraphs. Maybe I would write a small script that goes through each paragraph and removes it if it doesn't contain at least the word "rhett" or "scarlett", as conversations might be prefixed by the person speaking.

You can help us by posting two representative excerpts of the text, one you want to keep and one you want to discard, together with the word counting code you already have.

  • Comment on Re: Most frequent words in Gone with the Wind, help!

Replies are listed 'Best First'.
Re^2: Most frequent words in Gone with the Wind, help!
by margred (Initiate) on Mar 14, 2016 at 20:14 UTC
    Thank you for your reply so quick! To be honest, due to the fact that I am extremely new to perl, I've no idea how to even begin. I just started classes and I'm trying to practice it. I can do the basic things but nothing else. I've been reading up on tutorials all evening and watching youtube videos but I think my brain is pretty fried at this stage. Then I came upon this forum to see if I could ask for help directly. I'm so sorry that I'm no help at all. :/
      Actually Corion asked a few things, like identifying the book source, for example: http://gutenberg.net.au/plusfifty-a-m.html#mitchellm

      And extracting 2 parts, like for example:

      Scarlett turned away from Mammy with studied nonchalance, thankful tha +t her face had been unnoticed in Mammy's preoccupation with the matte +r of the shawl. "No, I want to sit here and watch the sunset. It's so pretty. You run +get my shawl. Please, Mammy, and I'll sit here till Pa comes home."

      As you can see, Spoken text starts and ends with normal double quotes (where you can sometimes also have other quotation marks).

      Now take your course material, and write up a simple loop around these 2 paragraphs (that you put into a text file).

      Once you have a loop, post it here, and we can give some ideas on how to improve the loop. Good luck!

      EDIT (addendum)

      I will be honest here: Unless you find a source that is like a screen play (a says: "..." b says: "...") it will be extremely hard to identify WHO is saying something, as that has a few rules. Last person that does an action is the one that is talking, unless, the "quoted text" is followed by the word said. However, this next example falls totally outside that:

      Mammy waddled back into the hall and Scarlett heard her call softly up + the stairwell to the upstairs maid.

      Guess who is the next one to talk? Right, Mammy... So in order to have it right, you will need lexical analysis (for which there are perl modules, but I think too complex for beginners?)

      screenplay... hint hint...

      This is not a code writing service.

      Personally, I would start now with reviewing the course material, because most likely, the instructor has covered all information that is necessary for writing a program that counts words from a text. If you are new to programming at all, I think Learning Perl is a good book. Modern Perl aims at somebody who already knows to program in another language.

        I understand, thank you for your help!