Hello monks,

I am working on a pretty nifty assignment involving the using the Markov Chain Algorithm to generate random text from an imported text file. This is done by counting a certain number (in this case two) of words that occur before a third word. It records the number of times $word3 occurs before $word1 and $word2. it then check to see what other words occur before $word1 and $word2, and records them.

In the second half of the program it will then use these frequencies to render the pseudo text.

Dr. Seuss seems to be best for showing this…

__DATA__ I will not eat them in a bar. I will not eat them in a car. I will not eat them Sam I am. I will not eat green eggs and ham! __END__

What I need help with is selecting a data structure that will let me save the matching word pairs (‘I will’ is the first pair) and the third word(‘not’) and record the number of times these parings occur and the percentage of times $word_three occurs before $word1_word2.

Thus the resulting data in the data might look something like this.

I will not (4, 100%) Will not eat (4, 100%) Not eat them(3, 75%) green (1, 25%) Eat them in (2, 66.6%) Sam (1, 33.3%) Them in a (2, 100%) In a bar (1, 50%) car (1, 50%) A bar I (1, 100%) Bar I will (1, 100%) A car I (1, 100%) Them Sam I (1, 100%) Sam I am. (1, 100%) I am. I (1, 100%) Am. I will (1, 100%) Green eggs and (1, 100%) Eggs and ham! (1, 100%) ( the read loop/subroutine terminates do to lack of new words.)

Since I think I will be accessing these parings via the $word1-$word2 parings to generate the output, I am inclined to use a hash of arrays.

Does this sound sane or am I running in the wrong direction?

This is my first attempt at using an advanced data structure in Perl, and while I am confident of my ability to pass the data into the program and parse it, I am still unsure about the best way to store the parsed data.

As always, thank you for your guidance.

-mox

In reply to Trying to select the best data structure by chinamox

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.