My suggestion is to post in one node the sample text, and the regular expression that is failing to match where you expect it to. Post them well formatted, using the tips found in Writeup Formatting Tips, and be sure that you're posting actual full copy and pastes of the text and code that fail.
When I test the text you provided, and the regexp you provided in the preceding node, I got the following results:
Capture variables:
- Digit Captures
$1 => Melville
$2 => New York
$3 => /s/KPMG LLP
${^PREMATCH} => ended June 30, 2001 in conformity with accounting principles generally accepted
in the United States of America. Also in our opinion, the related financial
statement schedule, when considered in relation to the basic consolidated
financial statements taken as a whole, presents fairly, in all material
respects, the information set forth therein.
${^MATCH} =>
Melville, New York /s/KPMG LLP
${^POSTMATCH} =>
September 26, 2001
STR
$^N => /s/KPMG LLP
@- => (352,354,364,372)
@+ => (411,362,372,411)
The text I used was exactly this:
ended June 30, 2001 in conformity with accounting principles generally
+ accepted
in the United States of America. Also in our opinion, the related fina
+ncial
statement schedule, when considered in relation to the basic consolida
+ted
financial statements taken as a whole, presents fairly, in all materia
+l
respects, the information set forth therein.
Melville, New York /s/KPMG LLP
September 26, 2001
STR
And the regexp I used was exactly this:
/^\s*(\w+),\s*(\w+ \w+)(.+?\s*LLP)/m
Try it yourself with my regexp tester, here: Perl Regex Tester
| [reply] [d/l] [select] |
Sorry, first time here. Still trying to find my way around. Yes, that code works now, but when I was trying to generalize it, I failed. Here is the new code:
/^\s*(\w+|\w+ \w+|\w+ \w+ \w+),\s*(\w+|\w+ \w+|\w+ \w+ \w+)\s*(.+?\s*L
+LP)/m
| [reply] [d/l] |
So in generalizing it you regressed. That happens. You could revert to your previous regexp, and then start again at trying to generalize it, but this time keeping closer track of the spaces, word characters, etc.
It doesn't seem to me that the more complicated solution (this most recent one) is actually superior. It's just more confusing. When regexps start getting too confusing, it's time to try again, or to break the problem up into smaller chunks. ...and of course it's also time for the "/x" modifier. :)
| [reply] [d/l] |
I have to modify it, because sometimes, the state could be "District of Columbia" or "Virginia"
| [reply] |