in reply to fill diacritic into text
Perhaps when reading the word frequency file, keep only those words that contain accented characters. That could save a bit of memory.
You then have to build a hash with the unaccented variants of those words as key and the original one as a value. Then read the second file, look up each word in it in the hash, and replace with the value if it exists. Take special care to preserving upper and lower case.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: fill diacritic into text
by jajaja (Initiate) on May 31, 2007 at 11:19 UTC | |
by ambrus (Abbot) on Jun 01, 2007 at 09:35 UTC |