First part

Every time I have a while and go back to the problem, I am not able to think very much forward. Maybe there is nothing more to think about? So let's add few words and finish the topic.

In the first part I wrote "All letters 's' at the beginning of word are marked bold. Two matches. Ok, we take first one always.". It definitely has to be changed. What options do we have here?

  1. Penalty for not being used / bonus for being used - mostly used ones in first place
    Pointed already by roboticus. Obvious. No question about it. If something has been used already - it is good and will be suggested to be reused. Don't think about pieces of toilet paper here...
  2. longest look-behind on the first place
    Obvious, too. If there is look-behind match 9-chars-long, the place is definitely better candidate than the 1-char-long one.
  3. common part from all proposed strings
    Finding common part even of few strings can be very difficult and I do not expect this to return any useful result.
  4. common part from some of proposed strings
    however, is promising one. How to decide which subset should be chosen? Best 10 strings chosen with help of 1st or 2nd method? Maybe, like in statistic research, take all proposals and remove e.g. 30% edge cases? What are edge cases? (String::Approx and Text::Levenshtein comes to mind, but as the thing is to predict exact string, I don't know if it will be of any use) Or maybe sort alphabetically all strings and return the biggest group starting with the same letter?
  5. closest to the current position
    This idea assumes that the paragraph/chapter is about something (e.g. 'unemployment') and similar words are near. See how definition of unemployment looks in wikipedia:
    Unemployment is the state in which a person is without work, available to work, and is currently seeking work. The unemployment rate is used in economic studies and economic indexes such as the United States' Conference Board's Index of Leading Indicators.
    Check the words: 'Unemployment', 'work', 'economic'.

And Yet Another Idea to consider during implementation: mark with digits 10 best candidates in text with possibility to use via Ctrl+<digit>, Ctrl+<left/right arrow> adds/removes parts of text (words), Enter accepts.

I was reading about LZW, Huffmann, Markov Chains, PPM, PAQ and other compression techniques suggested by you, guys. I don't exactly know how to use it to complement text. These algorithms operate on chars instead of words and this has made me think about following: suggesting one char is of no use. But what about suggesting 2 more chars every time user presses predicted char? Normal text at the beginning is what's already typed, bold is suggested(selection), and the rest light grayed to show what computer "has in mind".

unemployment
unemployment
unemployment
unemployment
unemployment
unemployment

Almost 50% of typing saved... Or better, 2 chars at the beginning, 3 in the next step, etc.

unemployment
unemployment
unemployment
unemployment
unemployment

Well, so much for theory. I plan to start implementing it in a 2-3 weeks as I have now other important things to do, so please be patient :)


In reply to creating crystal ball 2 by grizzley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.