To tell the truth, I'm quite surprised that you have a word frequency file whose words don't fit in memory. But if this is really the case, you can do the following.
First, transform the frequency file to another file by prefixing each line with the unaccented version of the word, but still keeping the accented version. You can do this easily without reading the whole file in memory. Then sort this file using the unaccented versions as a key. Then, read the sorted file. This time, you can do it in such a way that you only keep those lines in memory that are either accented, but do not have a larger frequency unaccented variant, because all the words for a given unaccented variant get together.
| [reply] |