in reply to Hash entries starting with a given string.

Thinking about the problem in the light of day (rather than stupid-o'clock in the morning), I'm guessing I may need to work on a very different implementation.

The actual purpose of the script is of linguistic interest. One of the ways of generating text that seems like English, without being it, is to generate strings of characters where the next character is selected based on the probability of it appearing in text given the previous n characters. The code I'm using to store probabilities effectively reads through the text and increments the hash entry for each substring of n characters.

So, looking at it in the harsh light of morning, and thinking about what I am actually saying (2^40, which is indeed referring to a 5 character string, is a very large number), I'm probably going to have to find a new way to deal with this...

  • Comment on Re: Hash entries starting with a given string.

Replies are listed 'Best First'.
Re^2: Hash entries starting with a given string.
by locked_user sundialsvc4 (Abbot) on Jul 28, 2008 at 16:33 UTC

    If, as it would seem, the definition of this problem looks-at the string as a concatenation of interesting-values ... such as prefixes and suffixes of an English word ... then perhaps what you really need here is “a hash of hashes.”

    In other words, if your problem looks at the string “prefoobar” as “pre+foobar,” then the first-level hash (containing “pre”) refers-to a subsequent hash that contains “foobar.”)

    If, on the other hand, it looks at the string as a string of n characters where the-important-thing is “what is the probability of character x occurring right-here,” perhaps what you need is a nested set of hashes per-character.

    At each position in the tree, you would store (for each character):

    1. The probability of “this character”
    2. A reference to a hash of possible “next characters” each one similarly (recursively...) described.

    In an application like this, your choice of “appropriate data-structure” will be paramount. Make very sure that your chosen data-structure works hard for you. Your data structure should not merely be “a passive-something that other parts of the program can conveniently (and repetitively...) search.” It must be a tool that the program can leverage to efficiently produce the solution.

    “Strange” or “awkward” as such-a-structure may feel as you are writing it ... it's the runtime-performance that really matters here. Given the right data-representation (and Perl Is No Slouch!), performance should be “impressive.” (If it isn't, look harder.) Perl is an industrial-strength power tool that eats requirements like this one for lunch...