margred has asked for the wisdom of the Perl Monks concerning the following question:

I'm extremely new to perl so please bear with me. I want to write a program that finds the most frequent words in the novel 'Gone with the Wind' the thing is, I want to specifically hone in on the conversations between Scarlett and Rhett but I've no idea how to isolate their conversations only. Thank you in advance!
  • Comment on Most frequent words in Gone with the Wind, help!

Replies are listed 'Best First'.
Re: Most frequent words in Gone with the Wind, help!
by Corion (Patriarch) on Mar 14, 2016 at 20:05 UTC

    The question is, what format do you have the words in already?

    Personally, I would simply go through the text and manually delete all non-fitting paragraphs. Maybe I would write a small script that goes through each paragraph and removes it if it doesn't contain at least the word "rhett" or "scarlett", as conversations might be prefixed by the person speaking.

    You can help us by posting two representative excerpts of the text, one you want to keep and one you want to discard, together with the word counting code you already have.

      Thank you for your reply so quick! To be honest, due to the fact that I am extremely new to perl, I've no idea how to even begin. I just started classes and I'm trying to practice it. I can do the basic things but nothing else. I've been reading up on tutorials all evening and watching youtube videos but I think my brain is pretty fried at this stage. Then I came upon this forum to see if I could ask for help directly. I'm so sorry that I'm no help at all. :/
        Actually Corion asked a few things, like identifying the book source, for example: http://gutenberg.net.au/plusfifty-a-m.html#mitchellm

        And extracting 2 parts, like for example:

        Scarlett turned away from Mammy with studied nonchalance, thankful tha +t her face had been unnoticed in Mammy's preoccupation with the matte +r of the shawl. "No, I want to sit here and watch the sunset. It's so pretty. You run +get my shawl. Please, Mammy, and I'll sit here till Pa comes home."

        As you can see, Spoken text starts and ends with normal double quotes (where you can sometimes also have other quotation marks).

        Now take your course material, and write up a simple loop around these 2 paragraphs (that you put into a text file).

        Once you have a loop, post it here, and we can give some ideas on how to improve the loop. Good luck!

        EDIT (addendum)

        I will be honest here: Unless you find a source that is like a screen play (a says: "..." b says: "...") it will be extremely hard to identify WHO is saying something, as that has a few rules. Last person that does an action is the one that is talking, unless, the "quoted text" is followed by the word said. However, this next example falls totally outside that:

        Mammy waddled back into the hall and Scarlett heard her call softly up + the stairwell to the upstairs maid.

        Guess who is the next one to talk? Right, Mammy... So in order to have it right, you will need lexical analysis (for which there are perl modules, but I think too complex for beginners?)

        screenplay... hint hint...

        This is not a code writing service.

        Personally, I would start now with reviewing the course material, because most likely, the instructor has covered all information that is necessary for writing a program that counts words from a text. If you are new to programming at all, I think Learning Perl is a good book. Modern Perl aims at somebody who already knows to program in another language.

Re: Most frequent words in Gone with the Wind, help!
by Not_a_Number (Prior) on Mar 14, 2016 at 20:48 UTC

    Hmm... Are you talking about the book or the film?

    If the latter, your task would be considerably simplified by the fact that the script can be found online in this sort of format:

    RHETT: I'm going to Charleston. Back where I belong. SCARLETT: Please, please take me with you. RHETT: No. I'm through with everything here. I want peace. I want to see if somewhere there is something left in life with charm +and grace. Do you know what I'm talking about? SCARLETT: No. I only know that I love you. RHETT: That's your misfortune. SCARLETT: Rhett! If you go, where shall I go? What shall I do? RHETT: Frankly my dear, I don't give a damn.

    (But then you would still have to determine whether a given line by e.g. SCARLETT is part of a conversation with RHETT, or with somebody else, or a monologue, or whatever...)

    And if the former, frankly, my dear, you have a big problem: parsing the text of a book for conversations between given characters is not a task for a beginner (in Perl or any other language).

Re: Most frequent words in Gone with the Wind, help!
by LanX (Saint) on Mar 14, 2016 at 21:26 UTC

      Not soooooooooooo far off. :P

      1. the (19632)
      2. and (16292)
      3. to (10213)
      4. of (8779)
      5. her (8552)
      6. she (8378)
      7. a (7864)
      8. was (6119)
      9. in (6103)
      10. you (4789)
      11. he (4709)
      12. had (4580)
      13. that (4520)
      14. it (4067)
      15. i (4023)
      16. with (3400)
      17. for (3365)
      18. his (3178)
      19. but (3139)
      20. as (3030)
      21. scarlett (2538)