Re: Regexp and OCR

Apparently all your transformations are removing insignificant characters (hyphens and spaces) and taking some classes of characters as equivalent (like "i" and "j"). In this case I think you don't even need regexen. Just choose a representing character from each class, then preprocess your list of names by removing the insignificant characters and normalizing the easy to confuse characters to the representing character. For example, you'd add a column to your database where this normalized name would be stored, fill it with the names with spaces removed and all "j" replaced with "i" etc, and index on this column. Then, when you ocr a name, you just normalize it the same way and search for the normalized string in this column.

Comment on Re: Regexp and OCR