Re^3: Parsing out URLs with regex (diedotstar)

Actually, this is a good example of when .*? is not the best choice. [^"]* is a much better idea. You don't want to run into this problem:

    $page= '<a href="foo">...'
      . '<a href="bar" title="baz"><b>Click Here';
    $page =~ /<a href="(.*?)" title="(.*?)"><b>Click Here/i;
[download]

where $1 will contain 'foo">...<a href="bar'.

- tye

Comment on Re^3: Parsing out URLs with regex (diedotstar) Select or Download Code

Replies are listed 'Best First'.
Re: Re^3: Parsing out URLs with regex (diedotstar) by halley (Prior) on May 14, 2003 at 19:51 UTC
Whups, didn't see the mandatory title="" in the match. Jumped the gun. -- `[ e d @ h a l l e y . c c ]`	[reply]
Re^5: Parsing out URLs with regex (diedotstar) by tye (Sage) on May 14, 2003 at 19:56 UTC
Note that such isn't really the problem. Putting nearly anything after or before the .? in a regex can cause you problems. Even just `/<a href="(.?)">/i` [download] will match way too much by matching way too early against `'<a href="oops" lots of stuff <a href="ok">' '<a href="oops" > d'oh! whitespace! <a href="ok">' '<a href="oops, break a browser? <a href="ok">'` [download] (: - tye	[reply] [d/l] [select]