To your question about the efficency: Because you want to check for phrases, not only words, you have to do N (size of your phrases list) comparisons. So it does not scale very well. Alternatively, you could create a SQLite database with two tables: one with all words (generated by a script) and one with all phrases (maintained by you). the first contains "links" to all phrases containing the word. What you now do is a kind of seed and extend search strategy: you test all words in your text. If the word is part of multiple phrases, you test all these phrases. if not, you have a single-word-phrase match or - if the word is not found - no phrase match. Note that this is only fast in practice if the phrase list is huge in comparison to the text size and the majority of phrases consist of few words only.

In reply to Re: Matching a long list of phrases by lima1
in thread Matching a long list of phrases by Hagbone

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.