Apparently all your transformations are removing insignificant characters (hyphens and spaces) and taking some classes of characters as equivalent (like "i" and "j"). In this case I think you don't even need regexen. Just choose a representing character from each class, then preprocess your list of names by removing the insignificant characters and normalizing the easy to confuse characters to the representing character. For example, you'd add a column to your database where this normalized name would be stored, fill it with the names with spaces removed and all "j" replaced with "i" etc, and index on this column. Then, when you ocr a name, you just normalize it the same way and search for the normalized string in this column.


In reply to Re: Regexp and OCR by ambrus
in thread Regexp and OCR by sflitman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.