in reply to greedy and lazy

It appears that you were cut off in the middle of your post. Could you please repost? From what I can see, your regexes look very interesting and I suspect from your choice of variable names that your target text is rather unpredictable, which makes that regexes more interesting still. I very much would like to take a look at what you are trying to accomplish.

Also, wrapping your code in <CODE></CODE> tags will format it nicely:

$np1="(?:$det|$gen)"; $np2 ="(?:$adj|$num|$conj|$adv|$inf)"; $np3="(?:$np1*\s*($noun)*\s*$np2*\s*($noun)+\s*$adj*)";
Cheers,
Ovid

Replies are listed 'Best First'.
RE: Re: greedy and lazy
by Anonymous Monk on Jul 25, 2000 at 00:08 UTC
    If you're interested in the actual patterns and all, here they are.
    $noun ="(?: *[A-Za-z0-9._]+\/NN[PS]*)"; $det ="( *[A-Za-z]+\/DT)"; $adj ="( *[A-Za-z]+\/JJ[RS]?)"; $gen ="( *[A-Za-z]+\/POSS)"; $adv="( *[A-Za-z\']+\/RB[RS]?)"; $inf =" *to\/TO"; $adv="( *[A-Za-z\']+\/RB[RS]?)"; $np1="(?:$det|$gen)"; $np2 ="(?:$adj|$num|$conj|$adv|$inf)"; $np3="(?:$np1*\s*(?:$noun)*\s*$np2*\s*(?:$noun)+\s*$adj*)"; $np4="((?:$noun)+\s*$np2+\s*(?:$noun)+)"; $np5="(?:$np1*\s*$adj+\s*($noun)+)"; # more complex noun and prep phrases $NP = "(?:(?:$np1)*\s*(?:$np3)+)"; $NP1 = "(?:$np3)+\s*(?:$np2)\s*(?:$np3)+"; $NP2 = "(?:(?:$np1)+\s*(?:$np3)+\s*(?:$np4)+)"; $NP3 ="(?:$np1*\s*$noun+\s*[^INV]+\s*(?:$noun)+)"; $NP4 ="$np1+\s*[^NV]+\s*$noun+"; $nps= "(?:($NP1)|($NP2)|($NP3)|($NP4)|($NP))"; $extnp="(?:($pro(?!\$))|($np5))";
    I'm basically trying to parse text into Noun, Verb, and Preposition Phrases. Nouns are the most troublesome at the moment, because although the individual patterns match what I want in test output, when they are OR'd together, their output is not always correct Thanks
      Ouch.

      I've heard good things about Parse::RecDescent... perhaps writing a small grammar with it would be more fruitful than using regular expressions.

        I'll take a look at the module. with my present approach, any given space can screw up one expression, which in turn messes up another and so on. not pleasant.