chinamox has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks,
I am working on a pretty nifty assignment involving the using the Markov Chain Algorithm to generate random text from an imported text file. This is done by counting a certain number (in this case two) of words that occur before a third word. It records the number of times $word3 occurs before $word1 and $word2. it then check to see what other words occur before $word1 and $word2, and records them.
In the second half of the program it will then use these frequencies to render the pseudo text.
Dr. Seuss seems to be best for showing this…
__DATA__ I will not eat them in a bar. I will not eat them in a car. I will not eat them Sam I am. I will not eat green eggs and ham! __END__
What I need help with is selecting a data structure that will let me save the matching word pairs (‘I will’ is the first pair) and the third word(‘not’) and record the number of times these parings occur and the percentage of times $word_three occurs before $word1_word2.
Thus the resulting data in the data might look something like this.
I will not (4, 100%) Will not eat (4, 100%) Not eat them(3, 75%) green (1, 25%) Eat them in (2, 66.6%) Sam (1, 33.3%) Them in a (2, 100%) In a bar (1, 50%) car (1, 50%) A bar I (1, 100%) Bar I will (1, 100%) A car I (1, 100%) Them Sam I (1, 100%) Sam I am. (1, 100%) I am. I (1, 100%) Am. I will (1, 100%) Green eggs and (1, 100%) Eggs and ham! (1, 100%) ( the read loop/subroutine terminates do to lack of new words.)
Since I think I will be accessing these parings via the $word1-$word2 parings to generate the output, I am inclined to use a hash of arrays.
Does this sound sane or am I running in the wrong direction?
This is my first attempt at using an advanced data structure in Perl, and while I am confident of my ability to pass the data into the program and parse it, I am still unsure about the best way to store the parsed data.
As always, thank you for your guidance.
-mox
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Trying to select the best data structure
by bobf (Monsignor) on Oct 29, 2006 at 03:37 UTC | |
by chinamox (Scribe) on Oct 29, 2006 at 13:53 UTC | |
|
Re: Trying to select the best data structure
by GrandFather (Saint) on Oct 29, 2006 at 05:25 UTC | |
by chinamox (Scribe) on Oct 29, 2006 at 15:03 UTC | |
|
Re: Trying to select the best data structure
by Khen1950fx (Canon) on Oct 29, 2006 at 03:17 UTC | |
by chinamox (Scribe) on Oct 29, 2006 at 13:43 UTC | |
|
Re: Trying to select the best data structure
by Not_a_Number (Prior) on Oct 29, 2006 at 08:56 UTC | |
by chinamox (Scribe) on Oct 29, 2006 at 13:25 UTC |