in reply to Re: phrase marking
in thread phrase marking

Nice solution, Monks. Thanks a lot. Hi Kyle, I am trying to understand the procedure of your method. I observe that the phrases that get marked in the sentence depend on their order in the array 'phrases'. The sentence is scanned from left to right and wherever it finds a match 'first' in the array order of phrases it selects that phrase. So, in case of an overlap, the phrase that appears first in the 'phrases' array is given the priority. Please correct me if I am wrong. Thanks.

Replies are listed 'Best First'.
Re^3: phrase marking
by kyle (Abbot) on Sep 09, 2008 at 18:49 UTC

    Yes, I think that's right. If you have overlapping phrases to mark, you'd have to figure out how to you want to deal with those, and then you'd probably have to use a different solution. If you have some phrases that are preferred over others, you can order them before building the expression.

      Hi Kyle, one more question. How can I adapt your solution for full word matching (i.e. with spaces as boundaries), although the solution works very well for partial matching? Nice thing about the solution is that I could have a matching preference order between single words and phrases by sorting them according to their length in the 'phrases' array. Thanks a lot.

        Include word boundaries in the patterns.

        my @phrases = ( 'pail of water', 'pale horse' ); my $phrases_re = join '|', map { "\\b$_\\b" } map { quotemeta } @phrases; foreach my $sentence ( @sentence_source ) { $sentence =~ s/($phrases_re)/\#$1\#/g; }

        (Note the extra map.)