That is useful sometimes, but here it's not needed, because a lookahead is enough.
Run this:
use warnings; $sentence='kinase inhibitor SET6 activates p16(INK4A) in cell-wall.'; my @phrases = ('kinase i', 'inhibitor', 'tor SET6', 'SET6', 'p16(INK4A +)', 'cell'); my $phrases_re = join '|', map { quotemeta } @phrases; $sentence =~ s/(^| )($phrases_re)(?= |$)/$1#$2#/g; print $sentence, "\n";
You get the output
kinase #inhibitor# #SET6# activates #p16(INK4A)# in cell-wall.
Update: There are ways to do this kind of thing without lookaheads or lookbehinds, just as a curiosity. Replace the substitution statement above with either
or$sentence =~ s/(^| )($phrases_re)( |$)/$1#$2#$3/g for 0, 1;
use 5.010; given ($sentence) { s/ / /g; s/(^| )($phrases_re)( |$)/$1# +$2#$3/g; s/ / /g; }
Update: One more alternative is below.
my %phrase; $phrase{$_}++ for @phrases; my @sentence = split /( +)/, $sentence; for (@sentence) { $phrase{$_} and $_ = "#" . $_ . "#"; }; $sentence = join "", @sentence;
Update: Oh, let's not forget this one either.
$sentence =~ s/(?<![^ ])($phrases_re)(?= |$)/#$1#/g;
In reply to Re^4: phrase match
by ambrus
in thread phrase match
by newbio
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |