in reply to Regexp Problem with greedy
Second, you are single-quoting 'M' in the second regexp. That means the regexp is going to be looking for </'M'> when it should be looking for </M>.
Third, (probably not a problem) < and > aren't metacharacters in regular expressions. You don't have to escape them.
Here is my own untested rewrite that might prove to work better.
{ local $/ = ""; while ( <IN> ) push @terms, map ({s/\n/ /}, m|<M>(.+?)</M\d+>|gis); push @avionics, map ({s/\n/ /}, m|<M>(.+?)</M>|/gis); } } { local $, = "\n"; print @terms; print @avionics; }
The theory here is: Setting $/ to "" sets "paragraph mode" where you read in chunks at a time. This helps to alleviate the problem of having text span multiple lines. We didn't allow for newlines within the tag itself though. Next, the regexp's are evaluated in list context and any matches captured in () are pushed into @terms and @avionics. The /s modifier causes '.' to also match newline characters. The map function modifies the returned list by (in this case) substituting \n with a space. Finally, I set $, to "\n" so that your printout is one element per new line. Modifications to $/ and $" were done "locally" so that they revert back to their original values when the blocks end.
I haven't tested the code yet, but it ought to do the trick.
Dave
"If I had my life to do over again, I'd be a plumber." -- Albert Einstein
|
|---|