in reply to (Ovid) Re: greedy and lazy
in thread greedy and lazy
I'm basically trying to parse text into Noun, Verb, and Preposition Phrases. Nouns are the most troublesome at the moment, because although the individual patterns match what I want in test output, when they are OR'd together, their output is not always correct Thanks$noun ="(?: *[A-Za-z0-9._]+\/NN[PS]*)"; $det ="( *[A-Za-z]+\/DT)"; $adj ="( *[A-Za-z]+\/JJ[RS]?)"; $gen ="( *[A-Za-z]+\/POSS)"; $adv="( *[A-Za-z\']+\/RB[RS]?)"; $inf =" *to\/TO"; $adv="( *[A-Za-z\']+\/RB[RS]?)"; $np1="(?:$det|$gen)"; $np2 ="(?:$adj|$num|$conj|$adv|$inf)"; $np3="(?:$np1*\s*(?:$noun)*\s*$np2*\s*(?:$noun)+\s*$adj*)"; $np4="((?:$noun)+\s*$np2+\s*(?:$noun)+)"; $np5="(?:$np1*\s*$adj+\s*($noun)+)"; # more complex noun and prep phrases $NP = "(?:(?:$np1)*\s*(?:$np3)+)"; $NP1 = "(?:$np3)+\s*(?:$np2)\s*(?:$np3)+"; $NP2 = "(?:(?:$np1)+\s*(?:$np3)+\s*(?:$np4)+)"; $NP3 ="(?:$np1*\s*$noun+\s*[^INV]+\s*(?:$noun)+)"; $NP4 ="$np1+\s*[^NV]+\s*$noun+"; $nps= "(?:($NP1)|($NP2)|($NP3)|($NP4)|($NP))"; $extnp="(?:($pro(?!\$))|($np5))";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: RE: Re: greedy and lazy
by chromatic (Archbishop) on Jul 25, 2000 at 08:45 UTC | |
by merlyn (Sage) on Jul 25, 2000 at 08:49 UTC | |
by oconnelm (Initiate) on Jul 26, 2000 at 02:30 UTC |