Quite often I have to type some translations with specific topic and specific vocabulary. As a professional programmer, I am very lazy and would like the computer to type it for me :) I started thinking about some textbox with autocompletion option, which will learn from what I am typing, find same parts in already typed text and offer hints. My native language is Polish, where declination and conjugation complicates the problem a lot, but I wanted start in english just to see how much work will take even simpliest solution.

What is the goal? Having text of n characters I would like to type only k letters, where k aims to 0. Without any dictionary at the beginning. Ok, there can be one: I open some file with other translation and on that basis the application initializes its dictionary/structures/autocompletion engine.

Well, what workers could I borrow from Perl and employ?

1. Word autocompletion. Application stores every typed word in some struct and lists matching words while I start typing another one. It's fast and simple to implement. And saves me much work, but only for long words.

2. Context search. There are regexps with negative, positive assertions. Hey, why don't do contextual search and try to guess next word without typing any char? It would be also really helpful for phrases like "on the top of" or "why don't you go to the". But this topic is really showing almost infinite possibilities. Actually I don't really need 3rd option...

Maybe some example text:

An entry is a widget that displays a one-line text string and allows
that string to be edited using methods described below, which are
typically bound to keystrokes and mouse actions. When first created,
an entry's s

's' at the end is the beginning of 'string' word. All letters 's' at the beginning of word are marked bold. Two matches. Ok, we take first one always. And now, time for the first really important question:

How much to autocomplete?

The problem is that autocompletion cannot offer unneeded chars at the end, because work saved on typing, will be spent on deleting those extra chars/words. Maybe just 2 modes of autocompletion? Which are: it shows e.g. 5-word long phrase and with 'enter' I accept it all, and with 'tab' - only first word? Let's save this idea for later, can be useful :)

What then? After accepting only part of autocompletion, what now? Should autocomplete create new list of possible hints, or do a

contextual search? How far?

Yes, positive look-behind assertion comes in mind, isn't it? :) I move it aside as I want to think more abstract right now, and after few seconds it shows up again and again :) Like mosquito :) Definitely it is good place for it. Go away, you little disturbing bastard, there will be time for you!

I must think once more, how much back should it look actually to do the right job? Should it look one word back? Two words? To the coma/period? Or maybe all those possibilities together? Where is the furthest limit? The beginning of the document... Well,

how much resources should I use?

Can I put just all possible variations ('it', 'it is', 'it is the', 'it is the string') to the cleverly constructed hash/array? Let's hope it won't slow down the thing awfully and won't eat all my RAM... But will it be really useful? Normally in the texts the same parts of text are not so long - few words at the most. Is there really a reason to look for 'let me finish this task for you and in this time you can do us both a tee'? I don't think so. And of course is it acceptable if the application takes 2 minutes to process all the text only to show 20 letters long super clever autocompletion? Definitely not, in 2 minutes I can write many letters, a lot more than 20. This must work fast to be helpful.

Emerging picture of the situation.

1000 words written, cursor at the end waiting for 1001st word. Let's operate on whole text. Now the simpliest regexp on the world looks for all word boundaries /\b/. Ca. 2000 matches. Application takes every matched position, looks behind (how far? 6 words?), compares with text just before cursor (how far? at least one word?), looks forward (how far? 15 letters?) and suggests something (how far? 15 letters too?). How many possibilities will there be?

Continue reading...


In reply to creating crystal ball by grizzley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.