Hello,
This is my first posting to the keepers of wisdom, so I'll try to keep it brief.
I've found that I need to stack a lot of regular expressions in order to force
pattern matching to occur first on larger comlpex patterns then on the smaller patterns that they are composed of
It seems to me that this is simply greedy matching, with the special circumstance that the largest patterns are
made up of optional and obligatory combinations of other patterns that will match at least the minimal pattern.
For instance:
$np1="(?:$det|$gen)";
$np2 ="(?:$adj|$num|$conj|$adv|$inf)";
$np3="(?:$np1*\s*($noun)*\s*$np2*\s*($noun)+\s*$adj*)";
used together in the following manner:
$NP = "(?:(?:$np1)*\s*$np2*(?:$np3)+)";
As I've mentioned, I want to match the longest patterns first but allow for
matching on the smaller patterns, which is my reason for including Kleene stars for optional subpatterns.
The problem that I'm having is that the optionality leads to matching
the minimal patterns and never the optionally longer ones.
My question then is whether I need to do as I am now doing, and
matching the longest patterns, or the next longest, and so on
down to the minimal patterns ?
I ask because the OR grouping from greatest coverage to least seems to
also be missing longer patterns.
So to sum, I need to match long patterns composed of smaller patterns
where the long ones match first, then failing that, the long ones match.
If my question is overly simple, or my discussion of it unclear, I apologize in advance
Thanks,
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.