Sorry I did not mention it before, but the file with my-rules (patterns) should be associated to another file with one or more possible answer(s). Thus when it is a match (true) Formula-Rule, I have to retrieve the sentence (questions) and the answer(s) associated with that rule.

Now, only the question is described by the rule, right? Are the question and answer to be stored in the same file or in different ones? For the moment I'll assume a total of two files, one for rules and one for question/answer pairs.

Okay, let's break the problem into two parts: data storage and data manipulation.

For storage your options are rather open, with the restriction that you have a trustworthy correlation between a rule in one file and the sentences which the rule describes in another. Thus, for any data set such as <s>/SYM Who/WP is/VBZ the/DT author/NN of/IN the/DT book/NN... ?/. </s>/SYM, you'll have two files each containing different subsets of the data, namely, tags and sentences. This will work fine as long as your files don't get tampered with, because you'll be depending on the order in which data appears to know which question properly belongs to each rule. If you're concerned about this, you could supply an index for each entry so that rule 0 corresponds to question/answer pair 0 in your other file. This is still far from unbreakable, but it's a little better. (As an aside, consider looking at something like DB_File if your data collection is going to get very large at all.)

Now as to the data structure for doing your actual look ups; yes, I still think a hash of arrays is a good place to start. You'll need the arrays to handle cases of multiple questions/answers per rule since hashes eliminate duplicate keys. Of course, in your text files you can have as many duplicate entries as you want because they're just text files! Probably you'll end up slurping both files into arrays and then combining them into a hash using some code along the lines I provided in my first post. Then as you run through a list of rules for which you wish to find question/answer pairs, you just have to do the hash lookup.

Good luck!


In reply to Re: Re: Re: How will I retrieve values from a POS-tagged question by djantzen
in thread How will I retrieve values from a POS-tagged question by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.