in reply to phrase marking

Please see Writeup Formatting Tips (You should not have <code> tags around your whole node.)

With many phrases, I'd probably do something like this:

my @phrases = ( 'pail of water', 'pale horse' ); my $phrases_re = join '|', map { quotemeta } @phrases; foreach my $sentence ( @sentence_source ) { $sentence =~ s/($phrases_re)/\#$1\#/g; }

Thanks to JadeNB for pointing out that I'd swapped "pale" and "pail".

Replies are listed 'Best First'.
Re^2: phrase marking
by newbio (Beadle) on Sep 09, 2008 at 17:25 UTC
    Nice solution, Monks. Thanks a lot. Hi Kyle, I am trying to understand the procedure of your method. I observe that the phrases that get marked in the sentence depend on their order in the array 'phrases'. The sentence is scanned from left to right and wherever it finds a match 'first' in the array order of phrases it selects that phrase. So, in case of an overlap, the phrase that appears first in the 'phrases' array is given the priority. Please correct me if I am wrong. Thanks.

      Yes, I think that's right. If you have overlapping phrases to mark, you'd have to figure out how to you want to deal with those, and then you'd probably have to use a different solution. If you have some phrases that are preferred over others, you can order them before building the expression.

        Hi Kyle, one more question. How can I adapt your solution for full word matching (i.e. with spaces as boundaries), although the solution works very well for partial matching? Nice thing about the solution is that I could have a matching preference order between single words and phrases by sorting them according to their length in the 'phrases' array. Thanks a lot.