in reply to Regexp and OCR
Possibly that helps with the speed, and if it does indeed, then things may become interesting. First, the regex matching the names can be then a simple concatenation of lexems like above, and second, its not necessarily that the second regex run would be needed at all, a trivial hash replacement would be enough, something along the following:
( I know it is naive, I've seen that you match sentences, not individual words, but still ).my %replace = ( "<CL>r<ij><it><ij>a<hn>" => "Christian", ... ); $text =~ s/\b(\w+)\b/exists($replace{$1}) ? $replace{$1} : $1/ge;
Again, if the alterations only consist of max 4 characters, I'm thinking that instead of composing them into "<ab>" structure, one can make them into a single unicode character f.ex. (pack("U1"), (ord("a") << 8) + ord("b")), and thus possibly gaining some extra milliseconds.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regexp and OCR
by sflitman (Hermit) on Jun 27, 2009 at 22:12 UTC |