in reply to Matching a long list of phrases
To your question about the efficency: Because you want to check for phrases, not only words, you have to do N (size of your phrases list) comparisons. So it does not scale very well. Alternatively, you could create a SQLite database with two tables: one with all words (generated by a script) and one with all phrases (maintained by you). the first contains "links" to all phrases containing the word. What you now do is a kind of seed and extend search strategy: you test all words in your text. If the word is part of multiple phrases, you test all these phrases. if not, you have a single-word-phrase match or - if the word is not found - no phrase match. Note that this is only fast in practice if the phrase list is huge in comparison to the text size and the majority of phrases consist of few words only.