bmal has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I need to match the nearest verb after a noun, within a sentence tagged with Part Of Speech, like this:

$sentence = "happy JJ Boy NN kick VB jump VB";
my code is sth. like this:
if ($sentence =~ /(Boy\sNN)(.|\n}*(\w+\s\VB)/) { $node = $1.' '.$3; print $node; }

what I want to match is verb 'kick', but it match the second verb after the noun 'Boy'. how can I match the first verb after the noun? Any suggestion?

Thank you in advance.

2001-12-26 Edit by Corion : Added formatting

Replies are listed 'Best First'.
Re: Regexp, match the nearest verb.
by ariels (Curate) on Dec 26, 2001 at 18:16 UTC

    Don't be greedy! No, really. The portion of your regexp (.|\n)* (note that you incorrectly have a close brace } instead of a close parenthesis ) there!) matches the longest stretch it can.

    The solution? Ask for a non-greedy match (see perlre for details of *?): replace (.|\n)* with (.|\n)*?, and the shortest match (for which the continuation of the regexp matches) will be returned. Also, you may wish to consider using various regexp modifiers if your string contains newlines, instead of enclosing them as .|\n.

      I see. Thank you!! But if I match the other way round, let say I have such a sentence: $sentence = "Boy NN wants VB to TO take VB note NN"; I want to match the nearest verb before the noun "note NN". the code is like this: ============================ $phrase = "note NN"; if ($sentence =~ /(\w+\sVB)*(.|\n)*?($phrase)/ { $node = $1.$3; print $node; } ============================ However, this gives me $1 = wants, instead of the one I want: "take". Is there any way to solve this problem? Besides, how to match the regexp from the end the string? Thanks!!

        Ah. What happens is that you still get the first match in the string. In this case, that's the match starting at "<samp>wants VB</samp>" (what's the `*' doing after `VB)' in your regexp, anyway?). You could use greediness to hit the last match in the string:

        $sentence =~ /.*(\w+\sVB)(.|\n)*?($phrase)/
        But that still doesn't guarantee "closenessest", just "lastness".

        Your alternative is to look up and use sexegers for this second task. You will find much information on this very site...


        PS. Please try to format your writeups so we can read them...