Re: Help Pattern Matching

To answer your direct question, the text captured by the parentheses is placed into $1 each time. So you can do this:

while ($page =~ /<b>(.*?)<\/b>/g) {
    # Now $1 contains the matched text
}
[download]

However, there are a few problems with the regex itself that you should be aware of. First, you're using .*, which matches as much as it can. So if your text is "foo bar", the parentheses will capture "foo bar"... not what you expect. Using .*? (non-greedy) will correct that problem.

Second, you say you're matching "lines", but you're also using the /s modifier on your regex, which means that the dot will match newlines. If you don't want newlines to be able to match a dot in your regex, then don't use /s.

Third, if you're extracting data from HTML, and especially if you anticipate doing this for more than your single page of text, you'll probably want to use an HTML parser module. HTML::Parser or HTML::TokeParser are examples. Good luck!

-- Mike

-- just,my${.02}

Comment on Re: Help Pattern Matching Select or Download Code

Replies are listed 'Best First'.
Re: Re: Help Pattern Matching by Anonymous Monk on Oct 25, 2002 at 22:39 UTC
Thank you for your reply. I'm grabbibg the web site as a single string thats why I'm using the /s regex. I was just trying to find a easier way of getting the parts I needed. Thank You For Your Help	[reply]