Though no perfect solution exists, a workable solution is better than no solution. With some modification the process offered by jjdraco can be made more reliable.


1. Use the other language dictionaries to strip all words that appear in other languages from the german dictionary and place them in a secondary german dictionary. This leaves the primary dictionary with only uniquely german words and all occurances can be safely ignored.


2. Your processor should have two modes, a reporting mode and an inspection/correction mode. In reporting mode you processor will simply run over the document gather information about words that are not in the primary dictionary. Have it report on statistics on running time, how many matches were made, and the most common matches. Using this you can ensure your checker runs in a reasonable amount of time and doesn't proviode a prohibitively large number of words for inspection. You will also be able to look at the most common matches to see if they can be reliably processed in an automatic way with some addition scripts. If so, then you can probably eliminate a large percentage of the words that would otherwise have to be inspected. Repeat this step until you have successfully taken out all you can automatically.


3. Run the processor in inspection mode so each non-match can be found and editted. Have the processor use the sencondary dictionaries to offer the inspector choices of automatic entries or to manually edit it.



The early worm gets the bird.


In reply to Re: Re: Re: detecting the language of a word? by Felonious
in thread detecting the language of a word? by domm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.