in reply to Help Pattern Matching

To answer your direct question, the text captured by the parentheses is placed into $1 each time. So you can do this:
while ($page =~ /<b>(.*?)<\/b>/g) { # Now $1 contains the matched text }

However, there are a few problems with the regex itself that you should be aware of. First, you're using .*, which matches as much as it can. So if your text is "<b>foo</b> <b>bar</b>", the parentheses will capture "foo</b> <b>bar"... not what you expect. Using .*? (non-greedy) will correct that problem.

Second, you say you're matching "lines", but you're also using the /s modifier on your regex, which means that the dot will match newlines. If you don't want newlines to be able to match a dot in your regex, then don't use /s.

Third, if you're extracting data from HTML, and especially if you anticipate doing this for more than your single page of text, you'll probably want to use an HTML parser module. HTML::Parser or HTML::TokeParser are examples. Good luck!

-- Mike

--
just,my${.02}

Replies are listed 'Best First'.
Re: Re: Help Pattern Matching
by Anonymous Monk on Oct 25, 2002 at 22:39 UTC
    Thank you for your reply.
    I'm grabbibg the web site as a single string thats why I'm using the /s regex. I was just trying to find a easier way of getting the parts I needed. Thank You For Your Help