in reply to Re^2: String contents
in thread String contents

My suggestion is to post in one node the sample text, and the regular expression that is failing to match where you expect it to. Post them well formatted, using the tips found in Writeup Formatting Tips, and be sure that you're posting actual full copy and pastes of the text and code that fail.

When I test the text you provided, and the regexp you provided in the preceding node, I got the following results:

Capture variables:

The text I used was exactly this:

ended June 30, 2001 in conformity with accounting principles generally + accepted in the United States of America. Also in our opinion, the related fina +ncial statement schedule, when considered in relation to the basic consolida +ted financial statements taken as a whole, presents fairly, in all materia +l respects, the information set forth therein. Melville, New York /s/KPMG LLP September 26, 2001 STR

And the regexp I used was exactly this:

/^\s*(\w+),\s*(\w+ \w+)(.+?\s*LLP)/m

Try it yourself with my regexp tester, here: Perl Regex Tester


Dave

Replies are listed 'Best First'.
Re^4: String contents
by perlyr (Novice) on Jun 29, 2012 at 09:36 UTC
    Sorry, first time here. Still trying to find my way around. Yes, that code works now, but when I was trying to generalize it, I failed. Here is the new code:
    /^\s*(\w+|\w+ \w+|\w+ \w+ \w+),\s*(\w+|\w+ \w+|\w+ \w+ \w+)\s*(.+?\s*L +LP)/m

      So in generalizing it you regressed. That happens. You could revert to your previous regexp, and then start again at trying to generalize it, but this time keeping closer track of the spaces, word characters, etc.

      It doesn't seem to me that the more complicated solution (this most recent one) is actually superior. It's just more confusing. When regexps start getting too confusing, it's time to try again, or to break the problem up into smaller chunks. ...and of course it's also time for the "/x" modifier. :)


      Dave

Re^4: String contents
by perlyr (Novice) on Jun 29, 2012 at 09:53 UTC
    I have to modify it, because sometimes, the state could be "District of Columbia" or "Virginia"