I am trying to write a program that implements the Markov algorithm ... (two words) must appear in the sentence at some random fixed point.

Depending on what you mean by "random fixed point", you may have moved out-of-bounds with respect to the Markov algorithm.

For the benefit of those following along, the traditional Markov random-text generator samples text to (conceptually) build a big sparse matrix. Each cell in the matrix represents the probability of transitioning from one state to the next, where "state" is a tuple of the N adjacent tokens. For example, by sampling

the quick red fox jumped over the lazy perl hacker
with N=1, the matrix will show that the probability of transitioning from (the) to (quick) is 50%. By sampling some larger text with N=2, the matrix might show that the probability of transitioning from (Perl is) to (is neat) to be 90%, and to (is hard) to be 10%.

The transition matrix is easier to implement using a tree, but conceptually it's just a sparse matrix.

To generate random text, we just roll the dice and choose a legal transition.

Your problem complicates this considerably. By saying that you want two words A and B to appear in the sentence at random points, you require an algorithm that finds a legal set of transitions that first gets you to A (or B), and then eventually gets you to B (or A), and then eventually terminates with the end of a sentence. You could do this with a tree search, but there's no guarantee that it will terminate unless a valid sequence of paths existed in your source text. And depending on how you record sentence start and sentence end, there might not be.

You might try this. Start with A, run the algorithm backwards until you find a valid sentence start. To do this (assuming you're using a tree), you'll need to build a second tree to do backwards transitions. Then run the algorithm forward building a tree breadth-first (so that you can terminate at some arbitrary depth) looking for B. If you find it, run the algorithm forward until you find the end of then sentence (again using a breadth-first search).

And I recommend that you don't try to hold your breath while the algorithm runs. :)


In reply to Re: help with a new type of Markov by dws
in thread help with a new type of Markov by thealienz1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.