Re^2: String contents

Replies are listed 'Best First'.
Re^3: String contents by davido (Cardinal) on Jun 29, 2012 at 09:26 UTC
My suggestion is to post in one node the sample text, and the regular expression that is failing to match where you expect it to. Post them well formatted, using the tips found in Writeup Formatting Tips, and be sure that you're posting actual full copy and pastes of the text and code that fail. When I test the text you provided, and the regexp you provided in the preceding node, I got the following results: Capture variables: Digit Captures $1 => Melville $2 => New York $3 => /s/KPMG LLP ${^PREMATCH} => ended June 30, 2001 in conformity with accounting principles generally accepted in the United States of America. Also in our opinion, the related financial statement schedule, when considered in relation to the basic consolidated financial statements taken as a whole, presents fairly, in all material respects, the information set forth therein. ${^MATCH} => Melville, New York /s/KPMG LLP ${^POSTMATCH} => September 26, 2001 STR $^N => /s/KPMG LLP @- => (352,354,364,372) @+ => (411,362,372,411) The text I used was exactly this: `ended June 30, 2001 in conformity with accounting principles generally + accepted in the United States of America. Also in our opinion, the related fina +ncial statement schedule, when considered in relation to the basic consolida +ted financial statements taken as a whole, presents fairly, in all materia +l respects, the information set forth therein. Melville, New York /s/KPMG LLP September 26, 2001 STR` [download] And the regexp I used was exactly this: `/^\s(\w+),\s(\w+ \w+)(.+?\s*LLP)/m` [download] Try it yourself with my regexp tester, here: Perl Regex Tester Dave	[reply] [d/l] [select]
Re^4: String contents by perlyr (Novice) on Jun 29, 2012 at 09:36 UTC
Sorry, first time here. Still trying to find my way around. Yes, that code works now, but when I was trying to generalize it, I failed. Here is the new code: `/^\s(\w+\|\w+ \w+\|\w+ \w+ \w+),\s(\w+\|\w+ \w+\|\w+ \w+ \w+)\s(.+?\sL +LP)/m` [download]	[reply] [d/l]
Re^5: String contents by davido (Cardinal) on Jun 29, 2012 at 09:43 UTC
So in generalizing it you regressed. That happens. You could revert to your previous regexp, and then start again at trying to generalize it, but this time keeping closer track of the spaces, word characters, etc. It doesn't seem to me that the more complicated solution (this most recent one) is actually superior. It's just more confusing. When regexps start getting too confusing, it's time to try again, or to break the problem up into smaller chunks. ...and of course it's also time for the "`/x`" modifier. :) Dave	[reply] [d/l]
Re^4: String contents by perlyr (Novice) on Jun 29, 2012 at 09:53 UTC
I have to modify it, because sometimes, the state could be "District of Columbia" or "Virginia"	[reply]

Capture variables: