Here's an approach that seems to satisfy the OPer's (somewhat vaguely expressed, and with the inferred qualifications noted by others, and including a sentence ending with a period) requirements (needs Perl 5.10  \K regex enhancement):

>perl -wMstrict -le "my @phrases = ( 'kinase i', 'hib', 'tor', 'tor SET6', 'SET6', 'p16(INK4A)', 'cell', ); my $delim = qr{ \. | \A \s* | \s+ | \s* \z }xms; my $phrase = join '|', reverse sort map quotemeta, @phrases; my $mark = qq{\x23}; for my $s (@ARGV) { print '--------------'; print $s; $s =~ s{ $delim \K ($phrase) (?= $delim) } {$mark$1$mark}xmsg; print $s; } " "cell kinase inhibitor SET6 activates p16(INK4A) in cell-wall tor SET6 +." "kinase tor tor SET6" "tor tor SET6 kinase" "tor tor SET6" "kinase tor tor SET6." "tor tor SET6 kinase." "tor tor SET6." "kinase inhibitor" "kinase inhibitor." -------------- cell kinase inhibitor SET6 activates p16(INK4A) in cell-wall tor SET6. #cell# kinase inhibitor #SET6# activates #p16(INK4A)# in cell-wall #to +r SET6#. -------------- kinase tor tor SET6 kinase #tor# #tor SET6# -------------- tor tor SET6 kinase #tor# #tor SET6# kinase -------------- tor tor SET6 #tor# #tor SET6# -------------- kinase tor tor SET6. kinase #tor# #tor SET6#. -------------- tor tor SET6 kinase. #tor# #tor SET6# kinase. -------------- tor tor SET6. #tor# #tor SET6#. -------------- kinase inhibitor kinase inhibitor -------------- kinase inhibitor. kinase inhibitor.

(Note:  "\x23" is the  "#" character. Have to do this because of a peculiarity of my command line 'editor'.)

If Perl version 5.10 is not available, use
    s{ ($delim) ($phrase) (?= $delim) }{$1$mark$2$mark}xmsg;
as the substitution regex (tested).

Of course, a lot more testing is recommended!

The  reverse in the
    my $phrase = join '|',reverse sort map quotemeta, @phrases;
statement causes the ordered alternation to match the longest phrase substring.

See also Regexp::Assemble and related modules for other (and perhaps better) ways to compile the  $phrase regex.


In reply to Re^3: phrase match by AnomalousMonk
in thread phrase match by newbio

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.