in reply to Help Pattern Matching
while ($page =~ /<b>(.*?)<\/b>/g) { # Now $1 contains the matched text }
However, there are a few problems with the regex itself that you should be aware of. First, you're using .*, which matches as much as it can. So if your text is "<b>foo</b> <b>bar</b>", the parentheses will capture "foo</b> <b>bar"... not what you expect. Using .*? (non-greedy) will correct that problem.
Second, you say you're matching "lines", but you're also using the /s modifier on your regex, which means that the dot will match newlines. If you don't want newlines to be able to match a dot in your regex, then don't use /s.
Third, if you're extracting data from HTML, and especially if you anticipate doing this for more than your single page of text, you'll probably want to use an HTML parser module. HTML::Parser or HTML::TokeParser are examples. Good luck!
-- Mike
--
just,my${.02}
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Help Pattern Matching
by Anonymous Monk on Oct 25, 2002 at 22:39 UTC |