in reply to Re^2: phrase match
in thread phrase match

Here's an approach that seems to satisfy the OPer's (somewhat vaguely expressed, and with the inferred qualifications noted by others, and including a sentence ending with a period) requirements (needs Perl 5.10  \K regex enhancement):

>perl -wMstrict -le "my @phrases = ( 'kinase i', 'hib', 'tor', 'tor SET6', 'SET6', 'p16(INK4A)', 'cell', ); my $delim = qr{ \. | \A \s* | \s+ | \s* \z }xms; my $phrase = join '|', reverse sort map quotemeta, @phrases; my $mark = qq{\x23}; for my $s (@ARGV) { print '--------------'; print $s; $s =~ s{ $delim \K ($phrase) (?= $delim) } {$mark$1$mark}xmsg; print $s; } " "cell kinase inhibitor SET6 activates p16(INK4A) in cell-wall tor SET6 +." "kinase tor tor SET6" "tor tor SET6 kinase" "tor tor SET6" "kinase tor tor SET6." "tor tor SET6 kinase." "tor tor SET6." "kinase inhibitor" "kinase inhibitor." -------------- cell kinase inhibitor SET6 activates p16(INK4A) in cell-wall tor SET6. #cell# kinase inhibitor #SET6# activates #p16(INK4A)# in cell-wall #to +r SET6#. -------------- kinase tor tor SET6 kinase #tor# #tor SET6# -------------- tor tor SET6 kinase #tor# #tor SET6# kinase -------------- tor tor SET6 #tor# #tor SET6# -------------- kinase tor tor SET6. kinase #tor# #tor SET6#. -------------- tor tor SET6 kinase. #tor# #tor SET6# kinase. -------------- tor tor SET6. #tor# #tor SET6#. -------------- kinase inhibitor kinase inhibitor -------------- kinase inhibitor. kinase inhibitor.

(Note:  "\x23" is the  "#" character. Have to do this because of a peculiarity of my command line 'editor'.)

If Perl version 5.10 is not available, use
    s{ ($delim) ($phrase) (?= $delim) }{$1$mark$2$mark}xmsg;
as the substitution regex (tested).

Of course, a lot more testing is recommended!

The  reverse in the
    my $phrase = join '|',reverse sort map quotemeta, @phrases;
statement causes the ordered alternation to match the longest phrase substring.

See also Regexp::Assemble and related modules for other (and perhaps better) ways to compile the  $phrase regex.