Re: Re: Parsing out URLs with regex

Agreed. When working on code for heavy use, don't reinvent the wheel.

For learning purposes, though, you don't want (\w*) for the maximum number of consecutive word characters, you want (.*?) for the minimum number of characters followed by the closing quote.

--
[ e d @ h a l l e y . c c ]

Comment on Re: Re: Parsing out URLs with regex

Replies are listed 'Best First'.
Re^3: Parsing out URLs with regex (diedotstar) by tye (Sage) on May 14, 2003 at 19:46 UTC
Actually, this is a good example of when .? is not the best choice. `[^"]` is a much better idea. You don't want to run into this problem: `$page= '<a href="foo">...' . '<a href="bar" title="baz"><b>Click Here'; $page =~ /<a href="(.?)" title="(.?)"><b>Click Here/i;` [download] where $1 will contain `'foo">...<a href="bar'`. - tye	[reply] [d/l] [select]
Re: Re^3: Parsing out URLs with regex (diedotstar) by halley (Prior) on May 14, 2003 at 19:51 UTC
Whups, didn't see the mandatory title="" in the match. Jumped the gun. -- `[ e d @ h a l l e y . c c ]`	[reply]
Re^5: Parsing out URLs with regex (diedotstar) by tye (Sage) on May 14, 2003 at 19:56 UTC
Note that such isn't really the problem. Putting nearly anything after or before the .? in a regex can cause you problems. Even just `/<a href="(.?)">/i` [download] will match way too much by matching way too early against `'<a href="oops" lots of stuff <a href="ok">' '<a href="oops" > d'oh! whitespace! <a href="ok">' '<a href="oops, break a browser? <a href="ok">'` [download] (: - tye	[reply] [d/l] [select]