in reply to creating crystal ball
Sorry ... I couldn't wait for you to post the next chunk. It's an interesting question, and while reading it, I couldn't help thinking about LZW compression. I'm not sure if you're familiar with it or not, but I think you could adapt it to your problem, especially considering:
What is the goal? Having text of n characters I would like to type only k letters, where k aims to 0. Without any dictionary at the beginning....snip...
1. Word autocompletion. Application stores every typed word in some struct and lists matching words while I start typing another one....snip...
2. Context search. There are regexps with negative, positive assertions. Hey, why don't do contextual search and try to guess next word without typing any char?
Using an LZW-like table would allow you to construct your dictionary as you use it. So as you work on a project, it would continue to improve its autocompletion guesses. Since you're not worrying about compression/decompression, you can easily edit the table. If you add a timestamp to the nodes, you could even prefer the most-recently-used words in preference to the most common. (Thus, the variable '$food' in your current project could be preferred to '$foobar' that you used in your example programs yesterday.)
As far as regexps are concerned, if you use the LZW-table idea, you wouldn't use regexes at all. You'd instead build a state machine where the last "n" keystrokes would be encoded into the current state, and you could examine the edge transitions from the current state and propose words/phrases based on usage.
As far as how much to autocomplete: I'm thinking that with the LZW table, you'd default to autocomplete all the "common" edges until you hit a steep drop in probability. Then you could use gestures to alter the amount to autocomplete. Something like: space would autocomplete the current word, tab would autocomplete the current "best guess". (If you display more than one at a time, you could use the down arrow to discard the current best guess...) An shifted character could accept up to and including that character in the current best guess, and a normal unshifted character could be entered normally (causing your table to update probabilities, etc.).
Regarding resource usage: Memory is cheap, so you might choose a (reasonably large) number of edges as a target, and if your table reaches that size, you might cull 5-10% of the edges--those with the lowest probability.
Other ideas that come to mind:
Hmmm.... now I've got it percolating in the back of my head. If you decide to start implementing it with this table-based idea, let me know, and I'll jot down my ideas for the data structure and such. Be sure to keep us apprised of this project, it sounds fun!
Can I put just all possible variations ('it', 'it is', 'it is the', 'it is the string') to the cleverly constructed hash/array? Let's hope it won't slow down the thing awfully and won't eat all my RAM... But will it be really useful? Normally in the texts the same parts of text are not so long - few words at the most. Is there really a reason to look for 'let me finish this task for you and in this time you can do us both a tee'? I don't think so.
One thing I like about the node/edge table-driven approach is that since you're culling low-probability edges, high probability strings can build to be arbitrarily long. So if you were tasked with typing your example sentence a hundred times as punishment, the sentence would quickly become the next predicted text.
And of course is it acceptable if the application takes 2 minutes to process all the text only to show 20 letters long super clever autocompletion? Definitely not, in 2 minutes I can write many letters, a lot more than 20. This must work fast to be helpful.
Yep ... that's the sticking point. I'm a touch-typist, and long delays would irritate me to death. But with computers getting faster and faster, you can throw more power at it each year. It's an interesting problem ... I'm looking forward to seeing what you come up with.
...roboticus
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: creating crystal ball
by salva (Canon) on Apr 21, 2008 at 13:59 UTC | |
|
Re^2: creating crystal ball
by grizzley (Chaplain) on Apr 22, 2008 at 11:25 UTC | |
by roboticus (Chancellor) on Apr 22, 2008 at 18:12 UTC | |
by grizzley (Chaplain) on Apr 23, 2008 at 07:08 UTC | |
|
Re^2: creating crystal ball
by ambrus (Abbot) on Apr 23, 2008 at 07:57 UTC | |
by grizzley (Chaplain) on Apr 28, 2008 at 10:57 UTC |