in reply to Efficient selective substitution on list of words
Now that I understand it correctly, I try again in a separate reply.
Pardon me if I'm jumping to conclusions, but it seems like your notion of "stopwords" is really just a matter making sure that the "word" string is not part of a larger word. If that's really all it amounts to, all you need is to put the \b assertion around each word:
The example data there points out a couple issues you may need to cope with using this approach:
But depending on the actual set of replacements you need to do, those issues are likely to be less bothersome than the problem of trying to figure out all the "stopwords" you would need to specify in order to avoid incorrect replacements within larger words.
In any case, the exercise as a whole really should be "previewed" or "monitored": for a given set of replacements and input data, get a listing of all the matches in the data, and/or review all changes applied by the process, to confirm that all changes are as intended. If you really are dealing with "natural language" data here, it pays to be really careful.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Efficient selective substitution on list of words
by BrowserUk (Patriarch) on Jan 31, 2010 at 17:15 UTC |